feature_environment

Last updated: 2024-09-17 21:09:07.291868 File source: link on GitLab

Introduction

This document aims to lay out the architecture supporting the current implementation of the feature environment.

The feature environment is described at https://gitlab.com/nunet/test-suite/-/tree/develop/environments/feature?ref_type=heads.

The ADRs for this architecture can be found at https://gitlab.com/nunet/test-suite/-/tree/develop/doc/architecture/decisions?ref_type=heads.

Architecture

In a nutshell, the feature environment launches virtual machines in remote hosts pre-configured with DMS already installed. These virtual machines are designed to be accessed via SSH for remote code execution.

The main challenge in this project is to guarantee that the compute resource from which the feature environment is launched can access, authenticate, deploy and run remote code in the target hosts and virtual machines. In order to do so, an overlay network must be preconfigured in order to access resources that are not directly exposed to the web. Even if all the virtual machine hosts were exposed to the web, this overlay network would come in handy, first to limit access to the LXD API, reducing attack surface of the solution, but also to accomodate for future domestic compute resources, once the the test suite is mature enough for compute resource providers from the community.

The solutions stack, described in the ADRs, are LXD for virtual machine management, Terraform/OpenTOFU for interaction with the LXD API, Slack Nebula for the overlay network and Gitlab CI for the code pipeline execution.

The LXD VM management scripts and terraform declaration can be found at https://gitlab.com/nunet/test-suite/-/tree/develop/infrastructure/dms-on-lxd?ref_type=heads

The project for Slack Nebula deployment can be found at https://gitlab.com/nunet/nebula/-/tree/main?ref_type=heads. At the time of writing the project is private, but it should be opened to public view once some sensitive aspects of the implementation are resolved.

The Gitlab CI pipeline implementation can be found at https://gitlab.com/nunet/test-suite/-/blob/develop/cicd/Feature-Environment.gitlab-ci.yml?ref_type=heads.

Diagram

The following is a diagram showing the relation between compute elements in the feature environment, namely:

  • the Gitlab Runner, which effectively execute Gitlab CI jobs

  • the LXC Remotes, which host the virtual machines containing the DMS installations for testing

  • the virtual machines that are spun up by the Gitlab CI

  • the nebula overlay network, which provides connectivity between all the aforementioned moving parts

toplogy

This topology diagram represents a Gitlab CI job, that is triggered via commit to develop branch. All compute elements reside inside the nebula overlay network and communicate internally using their respective private IPs. The virtual machines that are spun up also join the network.

Pre-requisites

This are the set of conditions needed before the feature environment can run.

First of all, there is the nebula network, which is used for connectivity. It has a lighthouse and users with pre-signed certificates that are used to connect and authenticate with the network. We have, therefore the need for one user for each Gitlab CI runner, LXD Host and LXD Virtual Machine. These users must be configured on a need-by-need basis, as there is no self-service, automated way to achieve this. For more information, see the nebula project in nunet gitlab group, as linked in the Architecture section.

Once nebula is configured for each of the compute elements, we need the LXD Hosts to expose the LXD API to the Gitlab CI Runners. This can be done either by setting the LXD API bind to the internal nebula IP, or to bind it globally (0.0.0.0). For instructions as to how to configure the LXD Api, see https://ubuntu.com/blog/directly-interacting-with-the-lxd-api.

Feature Environment pipeline

This section describes the flow of the pipeline from the point of view of the Gitlab CI.

The overall state flow of the CI Pipeline for the feature environment is as follows:

pipeline-sequence

In this graph it is represented the flow of the upstream and downstream pipelines that compose the flow of the feature environment.

Once code is pushed to DMS develop branch, the project's pipeline is triggered. This is the upstream pipeline. Among all the jobs, the two jobs of interest for the scope of this document are the Build DMS and Trigger feature environment (job names in the diagram might differ from the actual names).

The downstream pipeline is triggered in the nunet/test-suite project. The job that creates the virtual machines pull artifacts from the build job in order to pre-configure the Virtual Machines. Then the functional tests are run over SSH. The results of those jobs are sent to testmo and uploaded to the ci-reports webserver.

Once the tests are run, the lxd virtual machines are torn down.

Spinning up the virtual machines

The following graph represents the communication flow from the Gitlab CI Runner to the LXD Hosts in order to spin up the virtual machines:

spin-up

In this diagram, the first thing that happens when the job that creates the virtual machine is triggered is Gitlab CI pulling secrets from the vault. Those secrets are in the form of a base64 encoded config file. The specification of that file can be found at nunet/test-suite/infrastructure/dms-on-lxd/config.yml.dist.

Then it uses the information in that config file to interact with the LXD API using terraform and the internal nebula IP addresses for the LXD Hosts. It is important to note that, if any host is unavailable they are filtered out. The pipeline won't halt, but it will complain visually, with a warning sign that one or more hosts couldn't be reached. The pipeline halts and attempts to destroy the infrastructure if there is an error or if no LXD host is available.

The entire process can be understood if looking at the file nunet/test-suite/infrastructure/dms-on-lxd/make.sh.

Running the tests

The following graph allows for a broad illustration of the communication process employed in running the tests:

spin-up

From the Gitlab Job, feature tests are run using gherkin and beahve.

The scripts are setup in a way that they take in an inventory file containing a list of IP addresses and a private ssh key and run remote CLI commands over ssh. The command used to run behave in the pipeline is:

behave \
    -D inventory_file=$CI_PROJECT_DIR/infrastructure/dms-on-lxd/vms-ipv4-list.txt \
    -D ssh_key_file=$CI_PROJECT_DIR/infrastructure/dms-on-lxd/lxd-key \
    features/device-management-service/cli-tests/nunet_cli_lxd.feature

Both the files vms-ipv4-list.txt and lxd-key are generated by the make.sh script.

Notifications

We use slack-notification for gitlab to implement a notification system that allows us to see in slack when a provisioning job fails when it isn't supposed to. These jobs can be seen in the Feature Environment CICD pipeline.

It uses a webhook to the server-alerts channel, which is configured using Slack API's incoming webhooks. There is a Slack APP called Gitlab CI Notifications in which these webhooks are configured.

The webhook endpoint is stored in a Gitlab CI variable called SLACK_ALERTS_WEBHOOK which is configured at Nunet group's CICD Variables.

Maintaining the webhook is a matter of recreating the webhook endpoint for the target slack channel, in case it ever expires, and update the variable SLACK_ALERTS_WEBHOOK in the cicd varibles for the nunet group with the new webhook endpoint.

Last updated