Skip to main content

Ensemble Orchestration

In NuNet, compute workloads are organized as compute ensembles.
This section explains how ensembles are created, deployed, and supervised across the NuNet network.

What is a Compute Ensemble?

A compute ensemble is a collection of logical nodes and allocations that work together to perform a distributed workload.

  • Nodes represent the physical or virtual hardware where compute workloads run.
  • Allocations represent individual compute jobs assigned to nodes.
    Each node can host multiple allocations.

All allocations in an ensemble:

  • Receive a private IP address in the 10/8 range.
  • Are connected through a virtual private network (VPN) implemented via IP over libp2p.
  • Can communicate with one another via internal DNS.

Each allocation has:

  • A unique name and corresponding DNS name (e.g., allocation-name.internal).
  • A private IP accessible only within the ensemble.

Note: Allocation and node names must be unique within an ensemble.
Each ensemble has a globally unique UUID.

Ensemble Specification

To deploy an ensemble, users must define its structure and constraints in a YAML configuration file.
This file encodes the ensemble’s setup, including nodes, allocations, and supervision policies.

Core Structure

An ensemble specification includes:

  • Allocations:
    A map of allocation names and their configuration.
  • Nodes:
    A map of node names and their configuration.
  • Edges:
    A list of logical connections or constraints between nodes.
  • Additional Settings:
    Optional fields for SSH keys, provisioning scripts, and supervision policies.

Allocation Configuration

Each allocation has the following configuration fields:

1. Type

  • service: A long-running process that restarts automatically if it fails.
  • task: A one-time job that runs to completion.

2. Executor Type

Specifies the runtime environment for the allocation. Supported executors:

  • docker
  • firecracker

Future support:

  • wasm and other sandboxed runtimes.

3. Resources

Define compute requirements:

  • Memory, CPU cores, GPUs, etc.

4. Execution Details

Specify executor-specific configuration such as image, command, or entrypoint.

5. DNS Name

Optional DNS name for internal resolution.
If omitted, the allocation name is used.

6. SSH Keys

List of SSH keys for administrative access.

7. Provisioning Scripts

List of scripts to run during provisioning (executed in order).

8. Health Checks

Define custom health checks so the supervisor can monitor application status and handle failures.

Node Configuration

Each node configuration includes:

  • Allocations: The allocations assigned to the node.
  • Port Mapping: Rules for mapping public ports to internal allocation ports.
  • Location Constraints: Region, city, or ISP constraints.
  • Peer Assignment (Optional):
    Explicitly specify which peer should host the node — useful for organizations hosting sensitive data.

Future versions will support importing Kubernetes job descriptions and a declarative format for defining large-scale ensembles (e.g., 10k GPUs for LLM training).

Ensemble Constraints

Ensembles support user-defined constraints for precise control over deployments.

Supported Constraints (DMS v0.5)

  • Resource Constraints: Memory, cores, GPU details, etc.
  • Location Constraints: Specify or exclude certain regions, cities, or ISPs.
  • Edge Constraints: Define required bandwidth or latency between nodes.

Future Plans

Upcoming releases will add constraints for:

  • Contract existence
  • Price range
  • Explicit datacenter placement
  • Energy sources
  • Generalized graph-based constraint definitions

Ensemble Deployment

The deployment process determines how an ensemble is matched to available compute peers based on defined constraints.

Deployment is treated as a constraint satisfaction problem, where the orchestrator matches ensemble nodes to suitable provider peers.

Deployment Sequence Overview

  1. Initiate Deployment

    • The user invokes /dms/node/deployment/new on a DMS node (orchestrator host).
    • The orchestrator actor is created and assigned a unique ensemble ID.
    • The user can monitor progress and manage the ensemble using this ID.
  2. Request Bids

    • The orchestrator broadcasts /dms/deployment/request on the /nunet/deployment topic.
    • Provider nodes respond with bids if they match resource and capability requirements.
  3. Collect and Evaluate Bids

    • The orchestrator gathers bids within a timeout window.
    • If bids are incomplete, it rebroadcasts until all nodes have valid bids or the process times out.
  4. Constraint Evaluation

    • The orchestrator generates peer-to-node assignments and evaluates constraints (e.g., latency, bandwidth).
    • Invalid configurations are discarded.
  5. Commit Phase

    • Once a valid configuration is found, the orchestrator sends /dms/deployment/commit messages.
    • If any node fails to commit, the process reverts using /dms/deployment/revert.
  6. Provisioning

    • Allocations are provisioned via /dms/deployment/allocate.
    • VPN connections are established, IPs assigned, and workloads started.
  7. Supervision

    • Once active, the deployment is monitored continuously until shutdown or expiration.

In the future, users will be able to set deployment durations and enable features like auto-scaling and dynamic modification of running ensembles.

Ensemble Supervision

Once deployed, ensembles enter the supervision phase, where they are continuously monitored for:

  • Health status
  • Failures and restarts
  • Resource metrics

(Detailed documentation coming soon.)

Deploying in the NuNet Network

Understanding the authorization flow helps ensure secure and efficient deployments.

Actors in the System

SymbolDescription
UThe user initiating the deployment
OThe orchestrator (runs inside a DMS instance)
NₒNode hosting the orchestrator
PᵢSet of provider nodes available for deployment
NₚᵢⱼIndividual DMS nodes owned by providers
AᵢAllocation actors for running workloads

Each actor has:

  • DID(x) — Decentralized Identifier
  • ID(x) — Ephemeral or persistent actor ID
  • Peer(x) — Peer ID
  • Root(x) — Root anchor of trust

Behaviors and Capabilities

To enable deployment, specific behaviors and capabilities must be authorized between actors.

Behavior Namespaces

InteractionNamespace
User → Orchestrator Node/dms/node/deployment
Orchestrator → Providers (broadcast)/dms/deployment/request via /nunet/deployment
Orchestrator → Providers (pinned nodes)/dms/deployment/request (unicast)
Provider → Orchestrator/dms/deployment/bid
Orchestrator → Providers (control)/dms/deployment
Orchestrator → Allocations/dms/allocation (dynamic)
Orchestrator → Ensemble Nodes/dms/ensemble/<ensemble-id> (dynamic)

Capability Requirements

ActorRequired CapabilityTarget
User/dms/node/deploymentOrchestrator Node
Orchestrator Node/dms/deploymentProvider Nodes
Provider Nodes/dms/deployment/bidOrchestrator

Security and Access Control

NuNet’s decentralized actor model enforces fine-grained security:

  • Orchestrators only run on authorized DMS instances.
  • Bid requests are accepted only by trusted providers.
  • Bids are validated based on user-granted permissions.

This ensures:

  • Tight access control between actors.
  • Secure peer-to-peer deployment.
  • Transparent orchestration across decentralized compute resources.