Ensemble Orchestration
In NuNet, compute workloads are organized as compute ensembles.
This section explains how ensembles are created, deployed, and supervised across the NuNet network.
What is a Compute Ensemble?
A compute ensemble is a collection of logical nodes and allocations that work together to perform a distributed workload.
- Nodes represent the physical or virtual hardware where compute workloads run.
- Allocations represent individual compute jobs assigned to nodes.
Each node can host multiple allocations.
All allocations in an ensemble:
- Receive a private IP address in the
10/8range. - Are connected through a virtual private network (VPN) implemented via IP over libp2p.
- Can communicate with one another via internal DNS.
Each allocation has:
- A unique name and corresponding DNS name (e.g.,
allocation-name.internal). - A private IP accessible only within the ensemble.
Note: Allocation and node names must be unique within an ensemble.
Each ensemble has a globally unique UUID.
Ensemble Specification
To deploy an ensemble, users must define its structure and constraints in a YAML configuration file.
This file encodes the ensemble’s setup, including nodes, allocations, and supervision policies.
Core Structure
An ensemble specification includes:
- Allocations:
A map of allocation names and their configuration. - Nodes:
A map of node names and their configuration. - Edges:
A list of logical connections or constraints between nodes. - Additional Settings:
Optional fields for SSH keys, provisioning scripts, and supervision policies.
Allocation Configuration
Each allocation has the following configuration fields:
1. Type
service: A long-running process that restarts automatically if it fails.task: A one-time job that runs to completion.
2. Executor Type
Specifies the runtime environment for the allocation. Supported executors:
dockerfirecracker
Future support:
wasmand other sandboxed runtimes.
3. Resources
Define compute requirements:
- Memory, CPU cores, GPUs, etc.
4. Execution Details
Specify executor-specific configuration such as image, command, or entrypoint.
5. DNS Name
Optional DNS name for internal resolution.
If omitted, the allocation name is used.
6. SSH Keys
List of SSH keys for administrative access.
7. Provisioning Scripts
List of scripts to run during provisioning (executed in order).
8. Health Checks
Define custom health checks so the supervisor can monitor application status and handle failures.
Node Configuration
Each node configuration includes:
- Allocations: The allocations assigned to the node.
- Port Mapping: Rules for mapping public ports to internal allocation ports.
- Location Constraints: Region, city, or ISP constraints.
- Peer Assignment (Optional):
Explicitly specify which peer should host the node — useful for organizations hosting sensitive data.
Future versions will support importing Kubernetes job descriptions and a declarative format for defining large-scale ensembles (e.g., 10k GPUs for LLM training).
Ensemble Constraints
Ensembles support user-defined constraints for precise control over deployments.
Supported Constraints (DMS v0.5)
- Resource Constraints: Memory, cores, GPU details, etc.
- Location Constraints: Specify or exclude certain regions, cities, or ISPs.
- Edge Constraints: Define required bandwidth or latency between nodes.
Future Plans
Upcoming releases will add constraints for:
- Contract existence
- Price range
- Explicit datacenter placement
- Energy sources
- Generalized graph-based constraint definitions
Ensemble Deployment
The deployment process determines how an ensemble is matched to available compute peers based on defined constraints.
Deployment is treated as a constraint satisfaction problem, where the orchestrator matches ensemble nodes to suitable provider peers.
Deployment Sequence Overview
-
Initiate Deployment
- The user invokes
/dms/node/deployment/newon a DMS node (orchestrator host). - The orchestrator actor is created and assigned a unique ensemble ID.
- The user can monitor progress and manage the ensemble using this ID.
- The user invokes
-
Request Bids
- The orchestrator broadcasts
/dms/deployment/requeston the/nunet/deploymenttopic. - Provider nodes respond with bids if they match resource and capability requirements.
- The orchestrator broadcasts
-
Collect and Evaluate Bids
- The orchestrator gathers bids within a timeout window.
- If bids are incomplete, it rebroadcasts until all nodes have valid bids or the process times out.
-
Constraint Evaluation
- The orchestrator generates peer-to-node assignments and evaluates constraints (e.g., latency, bandwidth).
- Invalid configurations are discarded.
-
Commit Phase
- Once a valid configuration is found, the orchestrator sends
/dms/deployment/commitmessages. - If any node fails to commit, the process reverts using
/dms/deployment/revert.
- Once a valid configuration is found, the orchestrator sends
-
Provisioning
- Allocations are provisioned via
/dms/deployment/allocate. - VPN connections are established, IPs assigned, and workloads started.
- Allocations are provisioned via
-
Supervision
- Once active, the deployment is monitored continuously until shutdown or expiration.
In the future, users will be able to set deployment durations and enable features like auto-scaling and dynamic modification of running ensembles.
Ensemble Supervision
Once deployed, ensembles enter the supervision phase, where they are continuously monitored for:
- Health status
- Failures and restarts
- Resource metrics
(Detailed documentation coming soon.)
Deploying in the NuNet Network
Understanding the authorization flow helps ensure secure and efficient deployments.
Actors in the System
| Symbol | Description |
|---|---|
| U | The user initiating the deployment |
| O | The orchestrator (runs inside a DMS instance) |
| Nₒ | Node hosting the orchestrator |
| Pᵢ | Set of provider nodes available for deployment |
| Nₚᵢⱼ | Individual DMS nodes owned by providers |
| Aᵢ | Allocation actors for running workloads |
Each actor has:
DID(x)— Decentralized IdentifierID(x)— Ephemeral or persistent actor IDPeer(x)— Peer IDRoot(x)— Root anchor of trust
Behaviors and Capabilities
To enable deployment, specific behaviors and capabilities must be authorized between actors.
Behavior Namespaces
| Interaction | Namespace |
|---|---|
| User → Orchestrator Node | /dms/node/deployment |
| Orchestrator → Providers (broadcast) | /dms/deployment/request via /nunet/deployment |
| Orchestrator → Providers (pinned nodes) | /dms/deployment/request (unicast) |
| Provider → Orchestrator | /dms/deployment/bid |
| Orchestrator → Providers (control) | /dms/deployment |
| Orchestrator → Allocations | /dms/allocation (dynamic) |
| Orchestrator → Ensemble Nodes | /dms/ensemble/<ensemble-id> (dynamic) |
Capability Requirements
| Actor | Required Capability | Target |
|---|---|---|
| User | /dms/node/deployment | Orchestrator Node |
| Orchestrator Node | /dms/deployment | Provider Nodes |
| Provider Nodes | /dms/deployment/bid | Orchestrator |
Security and Access Control
NuNet’s decentralized actor model enforces fine-grained security:
- Orchestrators only run on authorized DMS instances.
- Bid requests are accepted only by trusted providers.
- Bids are validated based on user-granted permissions.
This ensures:
- Tight access control between actors.
- Secure peer-to-peer deployment.
- Transparent orchestration across decentralized compute resources.