dms-on-lxd

Last updated: 2024-11-21 22:06:44.019196 File source: link on GitLab

Introduction

This project aims to leverage terraform to provision LXD instances where multiple DMS executions will reside.

The ultimate goal of this is to have a generic and flexible provisioning standard to setup the DMS clusters wherever there is access to the LXD api.

Prerequisites

Remote

Locally

Dependencies:

This project can alternatively be run using the provided Dockerfile.

If using docker:

  1. build the image

docker build -t dms-on-lxd .
  1. run the image

docker run -it --rm -v $PWD:/app dms-on-lxd bash

You can use the dockerfile as a complete reference for all the dependencies that are expected to be present in order to execute this project

Usage

Using Docker

You can run all commands through Docker after building the image:

# Build the image
docker build -t dms-on-lxd .

# Run make.sh
docker run -it --rm \
  -v $PWD:/app \
  -v ~/.ssh:/root/.ssh:ro \
  -e DMS_ON_LXD_ENV \
  -e DMS_DEB_FILE \
  dms-on-lxd bash make.sh

# Run destroy.sh
docker run -it --rm \
  -v $PWD:/app \
  -v ~/.ssh:/root/.ssh:ro \
  -e DMS_ON_LXD_ENV \
  dms-on-lxd bash destroy.sh

The -v ~/.ssh:/root/.ssh:ro mount is optional but useful if you need SSH access to the host machine.

Configuring

First copy the configuration dist file:

cp config.yml.dist config.yml

Then modify the values accordingly:

lxd_hosts:
- host: localhost
  token: randomtoken
  port: 8443
- host: 10.250.251.1
  token: randomtoken
  port: 8443

If desired, you can customize the amount of DMS deployments by adding dms_instances_count to the config.yml file:

dms_instances_count: 2
lxd_hosts:
- #...

If ommited, one DMS instance per LXD host is deployed by default.

SSH Key

An ssh key called lxd-key (and lxd-key.pub) is created and used for the deployment of the instances. If you want to override the key, just add to terraform.tfvars:

ssh_pub_key = "ecdsa-sha2-nistp256 AAAABBBBBBBBBXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFmJ+rmL9YSVfXTEX+7P5VD6rciVYpig8BzmWJlJwdEmnuFhMyhsmtO31M2TwcW9TFNyfEsABCDEFGHI= EXAMPLE KEY"

Other terraform variables

The default terraform variables can be seen in variables.tf. Customizing their default values is optional.

To customize variables, for instance the dms file, which is "dms_deb_filepath", add this line to terraform.tfvars. Create the file if it doesn't exist:

dms_deb_filepath = "/full/path/to/nunet-dms-0.xxx.deb"

For a complete list of variables, check the file variables.tf.

Nebula

This project also supports using nunet/nebula which is a project that is based off slackhq/nebula.

Note that the nunet project is private as of the time of writing this document.

To enable the use of nebula, add to the terraform.tfvars:

enable_nebula = true

And provide the necessary nebula users with their respective associated IPs, adding them to the config.yml file:

nebula_users:
- username: nunet-test998
  password: boguspass
  ipv4: 10.251.252.253
- username: nunet-test999
  password: boguspass
  ipv4: 10.251.252.254
- # ...

Notice that you must provide at least the same amount of users as the expected dms instances to be deployed, otherwise the execution will fail.

Running

NOTE: If using docker, run these inside the container.

  1. Spin up the cluster using bash make.sh. NOTE: make isn't actually used for the deployment.

  2. Use lxd_vm_addresses.txt to connect and execute code in the remote instances:

for instance in $(cat lxd_vm_addresses.txt); do
  lxc exec --cwd=/opt/test-suite $instance -- {COMMAND_TO_EXECUTE}
done
  1. When done, destroy the infrastructure using bash destroy.sh.

Outputs

The following files are produced after running this project with make.sh.

add-lxd-remotes.sh

This script is a helper to add the lxd remote servers to your local lxd client in order to help with managing remote instances.

It looks something like this:

#!/usr/bin/env bash

lxc remote add --accept-certificate --password securepass localhost https://localhost:8443
lxc remote add --accept-certificate --token randomtoken localhost https://10.251.252.1:8443
# ...

Upon execution, the remotes are added to your local machine. You can then list the virtual machines in each remote:

$ lxc ls localhost:       
+--------------------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
|        NAME        |  STATE  |          IPV4           |                      IPV6                       |      TYPE       | SNAPSHOTS |
+--------------------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| dms-0-on-localhost | RUNNING | 172.17.0.1 (docker0)    | fd42:bd31:9a92:4fb2:216:3eff:feb4:abcd (enp5s0) | VIRTUAL-MACHINE | 0         |
|                    |         | 10.167.120.14 (enp5s0)  |                                                 |                 |           |
+--------------------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| dms-1-on-localhost | RUNNING | 172.17.0.1 (docker0)    | fd42:bd31:9a92:4fb2:216:3eff:fed2:efgh (enp5s0) | VIRTUAL-MACHINE | 0         |
|                    |         | 10.167.120.171 (enp5s0) |                                                 |                 |           |
+--------------------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+

You can then terminate instances at will, for instance if while using this project the opentofu component enters an inconsistent state:

$ lxc delete --force localhost:dms-0-on-localhost

reachable_hosts.yml

This is a list of hosts that have been tested and are reachable from the machine where this project is being executed:

lxd_hosts:
- {"host":"localhost","token":"randomtoken","port":8443}

unreachable_hosts.yml

This is a list of unreachable hosts, that during test failed to respond:

lxd_hosts:
- {"host":"10.250.251.1","token":"randomtoken","port":8443}

vms-ipv4-list.txt

This is a list of the IPv4s available for connection after provisioning the infrastructure. It is a simple file with one IP per line which can be easily iterated over using bash or any other language like python.

10.167.120.14
10.167.120.171

If nebula is enabled, these IPs are replaced with the internal IPs of nebula, assigned to each VM:

10.251.252.253
10.251.252.254

To iterate over the list for connecting over ssh using bash:

for ip in $(cat ./vms-ipv4-list.txt); do
  ssh -i ./lxd-key \
    -o IdentitiesOnly=yes \
    -o StrictHostKeyChecking=no \
    -o PubKeyAuthentication=yes \
    root@$ip -- echo ok
done

For processing the file in a script like python:

with open("vms-ipv4-list.txt") as file_fp:
    ip_list = [line.strip() for line in file_fp.readlines()]

Known issues

Docker and LXD

There are instances where docker prevents lxd instances to communicate with the internet consistently. This issue manifests itself in a scenario where the user can upgrade and install packages with APT but anything else will halt indefinitely.

To overcome this, add the following rules to iptables (using sudo whenever necessary):

iptables -I DOCKER-USER -i lxdbr0 -j ACCEPT
iptables -I DOCKER-USER -o lxdbr0 -j ACCEPT

Virtual Machines (SOLVED)

NOTE: in the current state, virtual-machine and container will work with the same terraform file at the expense of having async installation of DMS. Therefore beware that the terraform will return successfully while there will still be code running inside the lxd instances. Check either /var/log/cloud-init-output.log or /var/log/init.log inside each lxd instance for information whether installation finished successfully.

long explanation of the issue and solution

virtual-machine wasn't working before because the file block in lxd_instance resources expect to be able to provision files while the lxd instance is still in a stopped state, which work for containers because of the nature of their filesystem (overlayfs or similar) but not for virtual machines which have file system that isn't accessible as a direct folder. Using lxd_instance_file resource, which will upload a file once the instance is up and running solves the issue. However exec blocks in lxd_instance resource, which work synchornously with terraform won't work with lxd_instance_file if it depends on the file because execution can't be staggered until the file is provisioned. Therefore we have to leverage cloud-init's runcmd for that, which runs in the background after terraform returns.

Terraform docs

Generating docs

This part of the documentation is generated automatically from terraform using terraform-docs.

To update it, run:

terraform-docs markdown table --output-file README.md --output-mode inject .

Requirements

Providers

Modules

No modules.

Resources

Inputs

Outputs

Last updated