dms-on-lxd
Last updated: 2024-11-21 22:06:44.019196 File source: link on GitLab
Introduction
This project aims to leverage terraform to provision LXD instances where multiple DMS executions will reside.
The ultimate goal of this is to have a generic and flexible provisioning standard to setup the DMS clusters wherever there is access to the LXD api.
Prerequisites
Remote
LXD API enabled on the target hosts. See HOW-TO: expose LXD to the network.
For legacy versions you can refer to Directly interacting with the LXD API. This authentication method has been removed in recent versions of LXD.
Locally
Dependencies:
LXD CLI which must provide the
lxc
command line interface.DMS Deb file
To download the latest dms release, refer to dms installation guide
yq, a jq wrapper
This project can alternatively be run using the provided Dockerfile
.
If using docker:
build the image
run the image
You can use the dockerfile as a complete reference for all the dependencies that are expected to be present in order to execute this project
Usage
Using Docker
You can run all commands through Docker after building the image:
The -v ~/.ssh:/root/.ssh:ro
mount is optional but useful if you need SSH access to the host machine.
Configuring
First copy the configuration dist file:
Then modify the values accordingly:
If desired, you can customize the amount of DMS deployments by adding dms_instances_count
to the config.yml file:
If ommited, one DMS instance per LXD host is deployed by default.
SSH Key
An ssh key called lxd-key
(and lxd-key.pub
) is created and used for the deployment of the instances. If you want to override the key, just add to terraform.tfvars
:
Other terraform variables
The default terraform variables can be seen in variables.tf
. Customizing their default values is optional.
To customize variables, for instance the dms file, which is "dms_deb_filepath", add this line to terraform.tfvars
. Create the file if it doesn't exist:
For a complete list of variables, check the file variables.tf
.
Nebula
This project also supports using nunet/nebula which is a project that is based off slackhq/nebula.
Note that the nunet project is private as of the time of writing this document.
To enable the use of nebula, add to the terraform.tfvars
:
And provide the necessary nebula users with their respective associated IPs, adding them to the config.yml
file:
Notice that you must provide at least the same amount of users as the expected dms instances to be deployed, otherwise the execution will fail.
Running
NOTE: If using docker, run these inside the container.
Spin up the cluster using
bash make.sh
. NOTE:make
isn't actually used for the deployment.Use
lxd_vm_addresses.txt
to connect and execute code in the remote instances:
When done, destroy the infrastructure using
bash destroy.sh
.
Outputs
The following files are produced after running this project with make.sh
.
add-lxd-remotes.sh
This script is a helper to add the lxd remote servers to your local lxd client in order to help with managing remote instances.
It looks something like this:
Upon execution, the remotes are added to your local machine. You can then list the virtual machines in each remote:
You can then terminate instances at will, for instance if while using this project the opentofu component enters an inconsistent state:
reachable_hosts.yml
This is a list of hosts that have been tested and are reachable from the machine where this project is being executed:
unreachable_hosts.yml
This is a list of unreachable hosts, that during test failed to respond:
vms-ipv4-list.txt
This is a list of the IPv4s available for connection after provisioning the infrastructure. It is a simple file with one IP per line which can be easily iterated over using bash or any other language like python.
If nebula is enabled, these IPs are replaced with the internal IPs of nebula, assigned to each VM:
To iterate over the list for connecting over ssh using bash:
For processing the file in a script like python:
Known issues
Docker and LXD
There are instances where docker prevents lxd instances to communicate with the internet consistently. This issue manifests itself in a scenario where the user can upgrade and install packages with APT but anything else will halt indefinitely.
To overcome this, add the following rules to iptables
(using sudo
whenever necessary):
Virtual Machines (SOLVED)
NOTE: in the current state, virtual-machine
and container
will work with the same terraform file at the expense of having async installation of DMS. Therefore beware that the terraform will return successfully while there will still be code running inside the lxd instances. Check either /var/log/cloud-init-output.log
or /var/log/init.log
inside each lxd instance for information whether installation finished successfully.
long explanation of the issue and solution
virtual-machine
wasn't working before because the file
block in lxd_instance
resources expect to be able to provision files while the lxd instance is still in a stopped
state, which work for containers because of the nature of their filesystem (overlayfs or similar) but not for virtual machines which have file system that isn't accessible as a direct folder. Using lxd_instance_file
resource, which will upload a file once the instance is up and running solves the issue. However exec blocks in lxd_instance
resource, which work synchornously with terraform won't work with lxd_instance_file
if it depends on the file because execution can't be staggered until the file is provisioned. Therefore we have to leverage cloud-init
's runcmd
for that, which runs in the background after terraform returns.
Terraform docs
Generating docs
This part of the documentation is generated automatically from terraform using terraform-docs.
To update it, run:
Requirements
Providers
Modules
No modules.
Resources
Inputs
Outputs
Last updated