Device Management Service Test Suite
Introduction
This directory contains the test suite for the DMS. The tests in this directory verify the functionality of the DMS by creating a network of nodes and testing their interactions . These tests ensure that the core features of the DMS are working correctly in a multi-node environment.
Prerequisites
Before running the tests, ensure you have the following prerequisites installed:
- GlusterFS
- Docker
- The DMS binary built and available in the test directory (optional for make rule usage)
GlusterFS Setup
Ensure GlusterFS is installed and pull the container.
sudo modprobe fuse
sudo apt install glusterfs-client
docker pull ghcr.io/gluster/gluster-containers:fedora
How to Run
Using Make:
sudo make e2e
# or a specific test
sudo make e2e-DeploymentTests
Using Go:
go test -tags=e2e ./...
To run a specific test:
go test -tags=e2e -run TestE2E/BasicTests
Available test suites:
BasicTests: Tests basic node communicationDeploymentTests: Tests deployment functionalityDeploymentWithVolumesTests: Tests deployment with storage volumesStorageTests: Tests storage functionality
Structure
The test suite is organized as follows:
e2e_test.go: Entry point for all testssuite_test.go: Defines the test suite structure and common functionalityclient_test.go: Client implementation for interacting with DMS nodesbasic_test.go: Basic communication testsdeployment_test.go: Tests for deployment functionalityglusterfs_test.go: GlusterFS setup for storage testsstorage_test.go: Tests for storage functionalityvolume_test.go: Tests for volume managementutils_test.go: Utility functions for teststestdata/: Test deployment ensembles
Key Components
- TestSuite: The main test suite that sets up a network of nodes
- Client: A wrapper around the DMS CLI for testing
- prefixWriter: Used to prefix node logs with node identifiers
Best Practices
1. Parallelism
Tests should be run in parallel to speed up the test suite.
To add a test for a new feature in parallel, here's the suggested workflow:
-
Create a new test file in the
tests/e2edirectory. -
Define a runner function that takes a
*TestSuiteparameter:func NewFeatureTest(suite *TestSuite) {
// New feature test implementation
} -
Add your test to the
TestE2Efunction ine2e_test.go:t.Run("NewFeatureTests", func(t *testing.T) {
t.Parallel()
newFeatureTests := &TestSuite{
numNodes: 3, // Adjust as needed
Name: "new_feature_tests",
restPortIndex: 8100, // Use unique port ranges
p2pPortIndex: 10700, // Use unique port ranges
runner: NewFeatureTest,
}
suite.Run(t, newFeatureTests)
})
2. Port Allocation
Each test suite must use unique port ranges to avoid conflicts:
- Allocate unique
restPortIndexandp2pPortIndexfor each test suite - Increment by at least 3 (for a 3-node test) from the previous test suite's ports
- Document the port ranges used in comments to avoid future conflicts
3. Resource Management
Tests should properly clean up resources:
- Use
t.Cleanup()for Docker containers and other external resources - Ensure all nodes are properly shut down in the
TearDownSuitemethod - Verify resource allocation and deallocation in deployment tests
4. Test Data Organization
- Store test ensembles in
testdata/ensembles/ - Use descriptive names for test files
- When using dynamic hostnames, use the
replaceHostnameInFileutility
5. Error Handling
- Use
suite.Require()instead of package-level assertions to ensure proper test failure tracking - Add descriptive failure messages to assertions
- Use
suite.T().Logf()for detailed logging during test execution
6. Test Isolation
- Each test suite should be completely independent
- Do not share state between test suites
- Use unique node directories and configurations
7. Timeouts and Retries
- Use
suite.Require().Eventually()for operations that may take time to complete - Set appropriate timeouts based on operation complexity
- Include descriptive timeout messages
7. Assertions
What and how you should assert deployments:
Before deploying:
- (CP): resources
- free resources == onboarded resources
- allocated resources == 0
- (CP): allocations/list is empty
After deploying:
Note: it mostly does not apply for transient allocations as we usually use executions of short periods.. to assert the following with task allocations, we need to use transients allocations that run for more than 2 minutes
- (CP) assert allocation running (cmd: /dms/node/allocations)
- (CP) assert container running if possible
- (CP) assert resources
- allocated resources increased
- free resources decreased
- (Orchestrator) assert deployment status depending on allocations type
- (Orchestrator) assert manifest
- (CP) assert conn between containers is working for tests with multiple allocations!!!
After completed (if only tasks) or shutting it down:
- (CP) assert allocation NOT listed (cmd: /dms/node/allocations)
- (CP) assert container not running if possible
- (CP) assert resources
- allocated resources decreased
- free resources increased
- (Orchestrator) assert deployment status depending on allocations types
- (CP) assert subnet deleted (including tunneling iface)
Debugging
- create a debug binary
make linux_amd64_e2emake setcap_e2emake setcap_e2e_debug(fordlv)
- cache decrypted keys in FS via
DMS_E2E_CACHE_KEYS=1 - delve debug specific nodes via
DMS_E2E_DEBUG_NODES=0,2- opens ports
2340and2342respectively
- opens ports
- start remote debugging first
- pause the runner after a remote breakpoint is hit (to avoid timeouts)
Observability
- for production ELK, set
DMS_E2E_OBSERVE_API_KEYto the Elasticsearch API key - for local ELK, additionally set
DMS_E2E_OBSERVE_TOKENto the Elasticsearch APM secret token - (optional) set a custom prefix via
DMS_E2E_OBSERVE_PREFIXto distinguish between test runs, egmewill result inme/E2E/deployment_tests(also available as a labelprefix).