1 of 9

dms

Last updated: 2025-01-22 01:10:54.633953 File source: link on GitLab

dms

Specification

Description

This package is responsible for starting the whole application. It also contains various core functionality of DMS:

Onboarding compute provider devices
Job orchestration and management
Resource management
Actor implementation for each node

Structure and Organisation

Here is quick overview of the contents of this pacakge:

README: Current file which is aimed towards developers who wish to use and modify the dms functionality.
dms: This file contains code to initialize the DMS by loading configuration, starting REST API server etc
init: This file creates a new logger instance.
sanity_check: This file defines a method for performing consistency check before starting the DMS. proposed Note that the functionality of this method needs to be developed as per refactored DMS design.

Subpackages

jobs: Deals with the management of local jobs on the machine.
node: Contains implementation of Node as an actor.
onboarding: Code related to onboarding of compute provider machines to the network.
orchestrator: Contains job orchestration logic.
resources: Deals with the management of resources on the machine.

proposed: All files with *_test.go naming convention contain unit tests with respect to the specific implementation.

Class Diagram

The class diagram for the dms package is shown below.

Source file

dms Class diagram

Rendered from source file

!$rootUrlGitlab = "https://gitlab.com/nunet/device-management-service/-/raw/main"
!$packageRelativePath = "/dms"
!$packageUrlGitlab = $rootUrlGitlab + $packageRelativePath
 
!include $packageUrlGitlab/specs/class_diagram.puml

Functionality

TBD

Note: the functionality of DMS is being currently developed. See the proposed section for the suggested design of interfaces and methods.

Supervision

TBD as per proposed implementation

Supervisor
SupervisorStrategy
Statistics

Data Types

TBD

Note: the functionality of DMS is being currently developed. See the proposed section for the suggested data types.

Testing

proposed Refer to *_test.go files for unit tests of different functionalities.

Proposed Functionality / Requirements

List of issues

All issues that are related to the implementation of dms package can be found below. These include any proposals for modifications to the package or new functionality needed to cover the requirements of other packages.

dms package implementation

Interfaces & Methods

proposed Capability_interface

type Capability_interface interface {
	add()
	subtract()
}

add method will combine capabilities of two nodes. Example usage - When two jobs have to be run on a single machine, the capability requirements of each will need to be combined.

subtract method will subtract two capabilities. Example usage - When resources are locked for a job, the available capability of a machine will need to be reduced.

Data types

proposed dms.Capability

The Capability struct will capture all the relevant data that defines the capability of a node to perform the job. At the same time this will be used to define capability requirements that a job requires from a node.

An initial data model for Capability is defined below.

type Capability struct {
	// Executor is the type of executor available on the machine or the executor 
    // required for the job (example - Docker, VM, WASM etc)
    Executor    string           
	
    // Type specifies the details of type of job (One time, batch, recurring,
    // long running)
    Type        dms.jobs.JobType        
	
    // Resources specifies the description of the resources required
    Resources   dms.resources.Resource

    // Libraries specifies the libraries needed for the job        
	Libraries   []string         
	
    // Locality contains preferred localities of the machine for execution
    Locality    []string         
	
    // Storage specifies the preferred storage options that the machine should have
    Storage         []string         
	
    // Connectivity specifies the network configuration required
    Connectivity    dms.Connectivity          
	
    // Price specifies the price information of the job / machine
    Price           dms.PriceInformation 
	
    // Time specifies the time information of the job / machine
	Time            dms.TimeInformation 
	
    // KYC specifies the KYC requirements or KYC status of the machine 
    KYC  []string        
}

proposed dms.Connectivity

type Connectivity struct {

// Ports contains the ports that need to be open for the job to run
Ports []int

// VPN specifies whether VPN is required
VPN       bool

}

proposed dms.PriceInformation

type PriceInformation struct {
	// Currency holds which currency is used for pricing ex - NTX
	Currency   string 
	
    // CurrencyPerHour is the price of the machine per hour
    CurrencyPerHour int   
	
    // TotalPerJob is the maximum total price or budget of the job
	TotalPerJob   int 
	
    // Preference is Pricing preference as compared to time
    Preference int 
}

proposed dms.TimeInformation

type TimeInformation struct { // Units holds the units of time ex - hours, days, weeks Units string

// MaxTime holds the maximum time that the job should run
MaxTime    int

// Preference holds the time preference as compared to price
Preference int

}

References

behaviors

Last updated: 2025-01-22 01:10:54.906042 File source:

DMS Capabilities and Behaviors

The DMS Behaviors are a set of functionalities associated with a hierarchical namespace that the DMS performs if requested by an actor that has the necessary capabilities. Since capabilities are hierarchical, they can be applied either as exact match to a behavior or be implied by a top level capability. For example, the /dms/node/peers/ping behavior can be accessed by an actor with the /dms/node/peers/ping capability when specific or the /dms/node capability that implies everything below it.

Node Capabilities

DMS Node Namespace: /dms/node
- Description: Everything related to the management of a DMS node. Any capability that is implied by this namespace will be able to access all the behaviors below it. These should only be allowed to the controller/user of the DMS. Normally, it would be done by anchoring the DID of the controller user to the root anchor of the dms which allows the controller unlimited root access.

For a fine grained control, the following capabilities can be used:

Peer Capabilities

The following Peer capabilities are directly associated with peer behaviors under the same path. They allow the controller of a DMS to request it to perform these actions. For example, when the PeerSelf behavior is invoked by the controller, it's the peer address of the DMS node that is returned.

PeerPingBehavior: /dms/node/peers/ping
- Description: Ping a peer to check if it is alive.
PeersListBehavior: /dms/node/peers/list
- Description: List peers visible to the node.
PeerSelfBehavior: /dms/node/peers/self
- Description: Get the peer id and listening address of the node.
PeerDHTBehavior: /dms/node/peers/dht
- Description: Get the peers in DHT of the node along with their DHT parameters.
PeerConnectBehavior: /dms/node/peers/connect
- Description: Connect to a peer.
PeerScoreBehavior: /dms/node/peers/score
- Description: Get the libp2p pubsub peer score of peers

Onboarding Capabilities

The following capabilities deal with onboarding the DMS node as compute provider on the network. Onboarding a node involves setting a specific amount of compute resources for the node to allocate for incoming jobs.

OnboardBehavior: /dms/node/onboarding/onboard
- Description: Onboard the node as a compute provider.
OffboardBehavior: /dms/node/onboarding/offboard
- Description: Offboard the node as a compute provider.
OnboardStatusBehavior: /dms/node/onboarding/status
- Description: Get the onboarding status. Whether the node is onboarded or not and errors if any.

Node Deployment Capabilities

NewDeploymentBehavior: /dms/node/deployment/new
- Description: This node behavior is invoked by the controller to start a new deployment on the node. It takes an ensemble config as input and returns the deployment id.
DeploymentListBehavior: /dms/node/deployment/list
- Description: List all the deployments orchestrated by the node.
DeploymentLogsBehavior: /dms/node/deployment/logs
- Description: Get the logs of a particular deployment.
DeploymentStatusBehavior: /dms/node/deployment/status
- Description: Get the status of a deployment.
DeploymentManifestBehavior: /dms/node/deployment/manifest
- Description: Get the manifest of a deployment.
DeploymentShutdownBehavior: /dms/node/deployment/shutdown
- Description: Shutdown a deployment.

Resource Capabilities

ResourcesAllocatedBehavior: /dms/node/resources/allocated
- Description: The behavior returns the amount of resources allocated to Allocations running on the node. Allocated resources should always be less than or equal to the onboarded resources
ResourcesFreeBehavior: /dms/node/resources/free
- Description: The behavior returns the amount of resources that are free to be allocated on the node. Free resources should always be less than or equal to the onboarded resources
ResourcesOnboardedBehavior: /dms/node/resources/onboarded
- Description: The behavior returns the amount of resources the node is onboarded with.
HardwareSpecBehavior: /dms/node/hardware/spec
- Description: The behavior returns the hardware resource specification of the machine.
HardwareUsageBehavior: /dms/node/hardware/usage
- Description: The behavior returns the full resource usage on the machine including usage by other processes.

Logger Capabilities

LoggerConfigBehavior: /dms/node/logger/config
- Description: Configure the logger/observability config of the node.

Deployment Capabilities

The following capabilities are associated with the deployment of jobs on a DMS node. These capabilities and behaviors allow the controller to deploy services on nodes, list the deployed ensembles, get the logs from deployed allocations, get the status of the deployment, get the manifest of the deployment, and shutting down the deployment.

During regular use, it's recommended that compute providers delegate the /dms/deployment capability to orchestrators.

BidRequestBehavior: /dms/deployment/request
- Description: The behavior and capability that will need to be invoked by an orchestrator and delegated from a compute provider to the orchestrator. It allows the orchestrator to request a bid from the compute provider for a specific ensemble.
BidReplyBehavior: /dms/deployment/bid
- Description: The behavior and capability that will need to be invoked by a compute provider and delegated from an orchestrator to the compute provider. It allows the compute provider to reply to a bid request from an orchestrator.
CommitDeploymentBehavior: /dms/deployment/commit
- Description: The associated behavior with this capability allows an orchestrator to temporarily commit the resources the provider bid on until full allocation.
AllocationDeploymentBehavior: /dms/deployment/allocate
- Description: The associated behavior with this capability allows an orchestrator to allocate the resources the provider bid on after having committed it temporarily.
RevertDeploymentBehavior: /dms/deployment/revert
- Description: The associated behavior with this capability allows an orchestrator to revert any commit or allocation done during a deployment.

Capability Capabilities

Capability behaviors allow remote nodes to configure capability tokens on the node. The receiver node needs to have delegated the /dms/cap capability to the invoking node.

CapListBehavior: /dms/cap/list
- Description: The behavior and associated capability allow getting a list of all the capabilities another node had. The capability should be delegated to the node that needs to get the list of capabilities.
CapAnchorBehavior: /dms/cap/anchor
- Description: Allows anchoring capability tokens on another node.

Public Capabilities

The following capabilities are associated with public behaviors that can be invoked by any actor on the network. These capabilities are normally granted to all actors on the network that are KYC'd by NuNet with the /public capability. However, some nodes may choose to restrict these capabilities to specific actors and may not reply to invocations.

PublicHelloBehavior: /public/hello
- Description: A public hello behavior where any actor can invoke it on a specific node/actor and get a hello message back if public capability has been granted.
PublicStatusBehavior: /public/status
- Description: Invoking this behavior on a node will cause it to reply with its total resource amount it has on the machine along with an error message if any.
BroadcastHelloBehavior: /broadcast/hello
- Description: A public hello broadcast in which any actor/node that receives it will reply with a hello message along with its DID.

Allocation Capabilities

Allocation capabilities are normally granted to orchestrators once a deployment starts running to allow the orcestrator to manage the allocations it deployed. These capabilities are normally granted temporarily since the allocations themselves are ephemeral and live only for the duration of the deployment.

AllocationStartBehavior: /dms/allocation/start
- Description: Start an allocation after a deployment.
AllocationRestartBehavior: /dms/allocation/restart
- Description: Restart an allocation after a deployment has been started.
RegisterHealthcheckBehavior: /dms/actor/healthcheck/register
- Description: Register a new healthcheck mechanism for an allocation.

Subnet Capabilities

These too are associate with allocations and are granted to orchestrators once a deployment starts running. These capabilities allow the orchestrator to manage the subnet of the allocations it deployed in order to allow allocations to communicate with an ip layer on top of the p2p network.

SubnetAddPeerBehavior: /dms/allocation/subnet/add-peer
- Description: Add a peer to a subnet.
SubnetRemovePeerBehavior: /dms/allocation/subnet/remove-peer
- Description: Remove a peer from a subnet.
SubnetAcceptPeerBehavior: /dms/allocation/subnet/accept-peer
- Description: Accept a peer in a subnet.
SubnetMapPortBehavior: /dms/allocation/subnet/map-port
- Description: Map a port in a subnet. The mapping will be between the subnet ip and the port on the executor.
SubnetUnmapPortBehavior: /dms/allocation/subnet/unmap-port
- Description: Unmap a port in a subnet.
SubnetDNSAddRecordsBehavior: /dms/allocation/subnet/dns/add-records
- Description: Add DNS records to a subnet. Normally these records identify the allocations within the subnet. Each Allocation can have a dns_name parameter that can be used to identify the allocation but if not provided, the allocation name will be used instead. DNS names have a .internal suffix but can be used without them since the resolver within the executor will add it automatically if it supports it.
SubnetDNSRemoveRecordBehavior: /dms/allocation/subnet/dns/remove-record
- Description: Remove a DNS record from a subnet.

Ensemble Capabilities

Allocation Ensemble Capabilities are dynamic type namespaces that are created when an ensemble is deployed on a node. These capabilities are granted to orchestrators once a deployment starts running to allow the orchestrator to manage the allocations it deployed. These capabilities are normally granted temporarily and live only as long as the ensemble.

EnsembleNamespace: /dms/ensemble/%s
- Description: A dynamic namespace that allows the controller to interact with ensembles on the node. The %s will be replaced by the ensemble id once the deployment is running.
AllocationLogsBehavior: /dms/ensemble/%s/allocation/logs
- Description: Get the logs of an allocation in an ensemble.
AllocationShutdownBehavior: /dms/ensemble/%s/allocation/shutdown
- Description: Shutdown an allocation in an ensemble.
SubnetCreateBehavior:
- DynamicTemplate: /dms/ensemble/%s/node/subnet/create
- Static: /dms/node/subnet/create
- Description: Create a new subnet for an ensemble. This request is supposed to be received by the node of the compute provider and created for the allocations it creates for the ensemble.
SubnetDestroyBehavior:
- DynamicTemplate: /dms/ensemble/%s/node/subnet/destroy
- Static: /dms/node/subnet/destroy
- Description: Destroy a subnet for an ensemble. This request is supposed to be received by the node of the compute provider.

hardware

Last updated: 2025-01-22 01:10:55.174703 File source: link on GitLab

Specification

Description

The hardware package is responsible for handling the hardware related functionalities of the DMS.

Structure and Organisation

Here is quick overview of the contents of this package:

cpu: This package contains the functionality related to the CPU of the device.

ram.go: This file contains the functionality related to the RAM.

disk.go: This file contains the functionality related to the Disk.

gpu: This package contains the functionality related to the GPU of the device.

Functionality

GetMachineResources()

signature: GetMachineResources() (types.MachineResources, error)
input: None
output: types.MachineResources
output(error): error

GetCPU()

signature: GetCPU() (types.CPU, error)
input: None
output: types.CPU
output(error): error

GetRAM()

signature: GetRAM() (types.RAM, error)
input: None
output: types.RAM
output(error): error

GetDisk()

signature: GetDisk() (types.Disk, error)
input: None
output: types.Disk
output(error): error

Data Types

The hardware types can be found in the types package.

Testing

The tests can be found in the *_test.go files in the respective packages.

References

jobs

Last updated: 2025-01-22 01:10:55.452245 File source: link on GitLab

Computing in the NuNet Network

Table of Contents

Computing in the NuNet Network
- Ensemble Orchestration
- Deploying in the NuNet Network

Ensemble Orchestration

In NuNet, compute workloads are structured as compute ensembles. Here, we discuss how an ensemble can be created, deployed, and supervised in the NuNet network.

Compute Ensembles

An ensemble is a collection of logical nodes and allocations. Nodes represent the hardware where the compute workloads run. Allocations are the individual compute jobs that comprise the workload. Each allocation is assigned to a node, and a node can have multiple allocations assigned to it.

All allocations in the ensemble are assigned a private IP address in the 10/8 range and are connected with a virtual private network, implemented using IP over libp2p. All allocations can reach each other through the VPN. Allocation IP addresses can be discovered internally in the ensemble using DNS: each allocation has a name and a DNS name, which by default is just the allocation name in the .internal domain.

Allocation and Node names within an ensemble must be unique. The ensemble as a whole has a globally unique ID (a randomn UUID).

Ensemble Specification

In order to deploy an ensemble, the user must specify its structure and constraints; this is done with a YAML file encoding the ensemble configuration data structure; the fields of the configuration structure are described in detail in this reference.

Fundamentally the ensemble configuration has the following structure:

A map of allocations, mapping allocation names to configuration for individual allocations.
A map of nodes, mapping node names to configuration for individual nodes.
A list of edges between nodes, encoding specific logical edge constraints.
There are additional fields in the data structure which allows us to include ssh keys and scripts in the configuration, as well as supervision strategies policies.

An allocation's configuration has the following structure:

The name of the allocation executor; this is the environment in which the actual compute job is executed. We currently support docker and firecracker VMs, but we plan to also support WASM and generally any sandbox/VM that makes sense for users.
The resources required to run the allocation, such as memory, cpu cores, gpus, and so on.
The execution details, which encodes the executor specific configuration of the allocation.
The DNS name for internal name resolution of the allocation. This can be omitted, in which case the allocation's name becomes the DNS name.
The list of ssh keys ton drop in the allocation, so that administrators can ssh into the allocation.
The list of scripts to execute during provisioning, in execution order.
Finally, the user can also specify the application specific health check to be performede by the supervisor, so that the health of the application can be ascertained and failures detected.

A node's configuration has the following structure:

The list of allocations that are assigned to the node
The configuration of mapping public ports to ports in allocations
The Location constraints for the node
An optional field for explicitly specifying the peer on which the node should be assigned, allowing users and organizations to bring their own nodes into the mix, for instance for hosting sensitive data.

In the near future, we also plan to support directly parsing kubernetes job description files. We also plan to provide a declarative format for specifying large ensembles so that it is possible to succinctly describe a 10k GPU ensemble for training an LLM and so on.

Ensemble Constraints

It is worth reiterating that ensembles carry with the constraints, as specified by the user. This allows the user to have finegrained control of their ensemble deployment and ensure that certain requirements are met.

In DMS v0.5 we support the following constraints:

Resources for an allocation, such as memory, core count, gpu details, and so on.
Location for nodes; the user can specify the region, city, etc all the way to choosing a particular ISP. Location constraints can also be negative, so that a node will not be deployed in certain locations e.g. because of regulatory considerations such as GPDR.
Edge Constraints, which specify the relationship between nodes in the allocation in terms of available bandwidth and round trip time.

In subsequent releases we plan to add additional constraints (e.g. existence of a contract, price range, explicit datacenter placement, energy sources and so on) and generalize the constraint expression language as graphs.

Ensemble Deployment

Given an ensemble specification, the core functionality of the NuNet network is to find and assign peers to nodes that satisfies the constraints of the ensemble. The system treats the deployment as a constraint satisfaction problem over permutations of available peers (compute nodes) on which the user is authorized to deploy. The process of deploying an ensemble is called orchestration. In the following we summarize how deployment orchestration is performed.

Ensemble deployment is initiated with a user invoking the /dms/node/deployment/new behavior on the node which is willing to run an orchestrator for them; this can be just the user's private DMS running on his laptop. The node accepting the invocation creates the orchestrator actor inside its process space, initiates the deployment orchestration, and return to the user the ensemble identifier. The user can use this identifier to poll the status of the deployment and control of the ensemble through the orchestrator actor. The user also specifies a timeout on how long the deployment process should take before declaring failure. This is simply the expiration on the message that invokes /dms/node/deployment/new.

The orchestrator then proceeds to request bids for each node in the ensemble. This is accomplished by broadcasting a message to the /dms/deployment/request behavior in the /nunet/deployment broadcast topic. The deployment request contains a mapping of node names in the ensemble, together with their aggregate (for all allocations to be assigned in the node) resource constraints, together with location and other constraints that can restrict the search space.

In order for this to proceed, the orchestrator must have the appropriate capabilities; only provider nodes that accept the user's capabilities will respond to the broadcast message. The response to the bid request is a bid for a node in the ensemble, by sending a message to the /dms/deployment/bid behavior in the orchestrator. This also implies that the nodes that submit such bids must have appropriate capabilities accepted by the orchestrator.

Given the appropriate capabilities, the orchestrator collects bids until it has a sufficient number of bids or a timeout that ensures prompt progress in the deployment. If the orchestrator doesn't have bids for all nodes, then it rebroadcasts its bid request, excluding peers that have already submitted a bid. This continues until there are bids for all nodes or the deployment times out, at which point a deployment failure is declared.

Note that in the case of node pinning, where a specific peer is assigned to an ensemble node in advance (ie when a user brings their own nodes into the ensemble), bid requests are not broadcast but rather directly invoked on the peer.

Next, the orchestrator generates permutations of assignments of peers to nodes and evaluates the constraints. Some constraints can be directly rejected without measurement, for instance round trip latency constraints can be rejected by using speed of light calculations that provide a lower bound on physically realizable latency. We plan to do the same with bandwidth constraints, given the node measured link capacity and the throughput bound equation that governs TCP's behavior given bottleneck bandwidth and RTT.

Once a candidate assignment is deemed viable, the orchestrator proceeds to measure specific constraints for satisfiability. This involves measuring round trip time and bandwidth between node pairs, and is accomplished by invoking the /dms/deployment/constraint/edge behavior.

If a candidate assignment satisfies the constraints, the orchestrator proceeds with committing and provisioning the deployment. This is done with a two phase commit process: first the orchestrator sends a commit message to all peers to ensure that the resources are still available (nodes don't lock resources when submitting a bid), by invoking the /dms/deployment/commit behavior. If any node fails to commit, the candidate deployment is reverted and the orchestrator starts anew; revert happens with the /dms/deployment/revert behavior.

If all nodes successfully commit, the orchestrator proceeds to provision the deployment by sending allocation details to the relevant nodes and creating the VPN. This is initiated by invoking the /dms/deployment/allocate behavior on the provider nodes, which creates a new allocation actor. Subsequently, the orchestrator assigns IP addresses to allocations and creates the VPN (what we call the subnet) by invoking the appropriate behaviors on the allocation actors, and then starts the allocations. Once all nodes provision, the deployment is now considered running and enters supervision.

The deployment will keep running until the user shuts it down, as long as the user's agreement with the provider is active; in the near future we will also support explicitly specifying durations for running ensembles, and the ability to modify running ensembles in order to support mechanisms like auto scaling.

Ensemble Supervision

TODO

Deploying in the NuNet Network

In order to discuss authorization flow for deployment in the NuNet network, we need to distinguish certain actors in the system in the course of an ensembles lifetime.

Specifically, we introduce the following notation:

Let's call U, the user as an actor.
Let's call O the orchestrator, which is an actor living inside a DMS instance (node) for which the user is authorized to initiate a deployment. We call the node where the orchestrator runs N_o. Note that the DID of the orchestrator actor will be the same as the DID of the node on which it runs, but it will have an ephemeral actor ID.
Let's call P_i the set of compute providers that are willing to accept deployment requests from U.
Let's call N_{P_i,j} the DMS nodes controlled by the providers that are willing to accept deployments from users.
And finally let's call A_i the allocation actor for each running allocation. The DID of each allocation actor will be the same as the DID of the node on which the allocation is running, but it will have an ephemeral actor ID.

Also note that we have certain identifiers pertaining to these actors; let's define the following notation:

DID(x) is the DID of actor x; in general this is the DID that identifies the node on which the actor is running.
ID(x) is the ID of actor x; this is generally ephemeral, except for node root actors which have persistent identities matching their DID.
Peer(x) is the peer ID of a node/actor x.
Root(x) is the DID of the root anchor of trust for the node/actor x.

Behaviors and Capabilities

Using the notation above we can enumerate the behavior namespaces and requisite capabilities for deployment of an ensemble:

Invocations from U to N_o are in the /dms/node/deployment namespace
Invocations from O to N_{P_i,j} for deployment bids:
- broadcast /dms/deployment/request via the /nunet/deployment topic
- unicast /dms/deployment/request for pinned ensemble nodes
Messages from N_{P_i,j} to O:
- /dms/deployment/bid as the reply to a bid request
Invocations from O to N_{P_i,j} for deployment control are in the /dms/deployment namespace.
Invocations from O to A_i are in the /dms/allocation namespace and are dynamically granted programmatically.
Invocations from O to N_{P_i,j} for allocation control are in the dynamic /dms/ensemble/<ensemble-id> namespace and are dynamically granted programatically.

This creates the following structure:

U must be authorized with /dms/node/deployment capability in N_o
N_o must be authorized with /dms/deployment capability in N_{P_i,j} so that the orchestrator can make the appropriate invocations.
N_{P_i,j} must be authorized with /dms/deployment/bid capability on N_o so that it can submit bids to the orchestrator.

Note that the decentralized structure and fine grained capability model of the NuActor system allows for very tight access control. This ensures that:

Orchestrators can only run on DMS instances where the user is authorized to initiate deployment.
Bid requests will only be accepted by provider DMS instances where the user is authorized to deploy.
Bids will only be accepted by provider DMS instances whom the user has authorized.

In the following we examine common functional scenarios on how to set up the system so that deployments are properly authorized.

Deploying in a Private Network

TODO

Authorizing a Third Party to Vet Users

TODO

Distributing and Revoking Capability Tokens

TODO

Public Deployment

TODO

node

Last updated: 2025-01-22 01:10:55.800895 File source:

node

Specification

`proposed` Description

This package is responsible for creation of a Node object which is the main actor residing on the machine as long as DMS is running. The Node gets created when the DMS is onboarded.

The Node is responsible for:

Communicating with other actors (nodes and allocations) via messages. This will include sending bid requests, bids, invocations, job status etc
Checking used and free resource before creating allocations
Continuous monitoring of the machine

Structure and Organisation

Here is quick overview of the contents of this pacakge:

Class Diagram

The class diagram for the node package is shown below.

Source file

Rendered from source file

Functionality

TBD

Data Types

TBD

Testing

proposed Refer to *_test.go files for unit tests of different functionalities.

Proposed Functionality / Requirements

List of issues

Interfaces & Methods

proposed Node_interface

getAllocation method retrieves an Allocation on the machine based on the provided AllocationID.

checkAllocationStatus method will retrieve status of an Allocation.

routeToAllocation method will route a message to the Allocation of the job that is running on the machine.

benchmarkCapability method will perform machine benchmarking

setRegisteredCapability method will record the benchmarked Capability of the machine into a persistent data store for retrieval and usage (mostly in job orchestration functionality)

getRegisteredCapability method will retrieve the benchmarked Capability of the machine from the persistent data store.

setAvailableCapability method changes the available capability of the machine when resources are locked

getAvailableCapability method will return currently available capability of the node

lockCapability method will lock certain amount of resources for a job. This can happen during bid submission. But it must happen once job is accepted and before invocation.

getLockedCapabilities method retrieves the locked capabilities of the machine.

setPreferences method sets the preferences of a node as dms.orchestrator.CapabilityComparator

getPreferences method retrieves the node preferences as dms.orchestrator.CapabilityComparator

getRegisteredBids method retrieves list of bids receieved for a job.

startAllocation method will create an allocation based on the invocation received.

Data types

proposed dms.node.Node

An initial data model for Node is defined below.

proposed dms.node.NodeID

References

onboarding

Last updated: 2025-01-22 01:10:56.343328 File source:

onboarding

Specification

Description

This file explains the onboarding functionality of Device Management Service (DMS). This functionality is catered towards compute providers who wish provide their hardware resources to Nunet for running computational tasks as well as developers who are contributing to platform development.

Structure and Organisation

Here is quick overview of the contents of this directory:

Class Diagram

The class diagram for the onboarding package is shown below.

Source file

Rendered from source file

Functionality

Onboard

signature: Onboard(ctx context.Context, config types.OnboardingConfig) error
input #1: Context object
input #2: types.OnboardingConfig
output (error): Error message

Onboard function executes the onboarding process for a compute provider based on the configuration provided.

Offboard

signature: Offboard(ctx context.Context) error
input #1: Context object
output: None
output (error): Error message

Offboard removes the resources onboarded to Nunet.

IsOnboarded

signature: IsOnboarded(ctx context.Context) (bool, error)
input #1: Context object
output #1: bool
output #2: error

IsOnboarded checks if the compute provider is onboarded.

Info

signature: Info(ctx context.Context) (types.OnboardingConfig, error)
input #1: Context object
output #1: types.OnboardingConfig
output #2: error

Info returns the configuration of the onboarding process.

Data Types

types.OnboardingConfig: Holds the configuration for onboarding a compute provider.

Testing

Proposed Functionality / Requirements

List of issues

References

orchestrator

Last updated: 2025-01-22 01:10:56.606238 File source:

orchestrator

Specification

Description

The orchestrator is responsible for job scheduling and management (manages jobs on other DMSs).

A key distinction to note is the option of two types of orchestration mechanisms: push and pull. Broadly speaking pull orchestration works on the premise that resource providers bid for jobs available in the network, while push orchestration works when a job is pushed directly to a known resource provider -- constituting to a more centralized orchestration. push orchestration develops on the idea that users choose from the available providers and their resources. However, given the decentralized and open nature of the platform, it may be required to engage the providers to get their current (latest) state and preferences. This leads to an overlap with the pull orchestration approach.

The default setting is to use pull based orchestration, which is developed in the present proposed specification.

proposed Job Orchestration

The proposed lifecyle of a job on Nunet platform consists of various operations from job posting to settlement of the contract. Below is a brief explanation of the steps involved in the job orchestration:

Job Posting: The user posts a job request to the DMS. The job request is validated and a Nunet job is created in the DMS.
Search and Match:
a. The Service provider DMS requests for bids from other nodes in the network.
b. DMS on compute provider compares the capability of the available resources against job requirements. If all the requirements are met, it then decides whether to submit a bid.
c. The received bids are assessed and the best bid is selected.
Job Request: In case the shortlisted compute provider has not locked the resources while submitting the bid, the job request workflow is executed. This requires the compute provider DMS to lock the necessary resources required for the job and re-submit the bid. Note that at this stage compute provider can still decline the job request.
Contract Closure: The service provider and the shortlisted compute provider verify that the counterparty is a verified entity and approved by Nunet Solutions to participate in the network. This in an important step to establish trust before any work is performed.
If job does not require any payment (Volunteer Compute), contract is generated by both Service Provider and Compute Provider DMS. This is then verified by Contract-Database. Otherwise, proof of contract needs to be received from the Contract-Database before start of work.
Invocation and Allocation: When the contract closure workflow is completed, both the service provider and compute provider DMS have an agreement and proof of contract with them. Then the service provider DMS will send an invocation to the compute provider DMS which results in job allocation being created. Allocation can be understood as an execution space / environment on actual hardware that enables a Job to be executed.
Job Execution: Once allocation is created, the job execution starts on the compute provider machine.
Contract Settlement: After job is completed, service provider DMS verifies the work done. If the work is correct, the Contract-Database makes the necessary transactions to settle the the contract.

Structure and Organisation

Here is quick overview of the contents of this directory:

Subpackages

Class Diagram

Source

Rendered from source file

Functionality

TBD

Data Types

TBD

Testing

TBD

Proposed Functionality / Requirements

List of issues

Interfaces & Methods

proposed Orchestrator interface

publishBidRequest: sends a request for bid to the network for a particular job. This will depend on the network package for propagation of the request to other nodes in the network.

compareCapability: compares two capabilities and returns a CapabilityComparison object. Expected usage is to compare capability required in a job with the available capability of a node.

acceptJob: looks at the comparison between capabilities and preferences of a node in the form of CapabilityComparator object and decides whether to accept a job or not.

sendBid: sends a bid to the node that propagated the BidRequest.

selectBestBid: looks at all the bids received and selects the best one.

sendJobRequest: sends a job request to the shortlisted node whose bid was selected. The compute provider node needs to accept the job request and lock its resources for the job. In case resources are already locked while submitting the bid, this step may be skipped.

sendInvocation: sends an invocation request (as a message) to the node that accepted the job. This message should have all the necessary information to start an Allocation for the job.

orchestrateJob: this will be called when a job is received via postJob endpoint. It will start the orchestration process. It is also possible that this method could be called via a timer for jobs scheduled in the future.

proposed Actor interface

sendMessage: sends a message to another actor (Node / Allocation).

processMessage: processes the message received and decides on what action to take.

proposed Mailbox interface

receiveMessage: receives a message from another Node and converts it into a telemetry.Message object.

handleMessage: processes the message received.

triggerBehavior: this is where actions taken by the actor based on the message received will be defined.

getKnownTopics: retrieves the gossip sub topics known to the node.

getSubscribedTopics: retrieves the gossip sub topics subscribed by the node.

subscribeToTopic: subscribes to a gossip sub topic.

unsubscribeFromTopic: un-subscribes from a gossip sub topic.

proposed Other methods

Methods for job request functionality a. check whether resources are locked b. lock resources c. accept job request
Methods for contract closure a. validate other node as a registered entity b. generate contract c. kyc validation
Methods for job exeuction a. handle job updates
Methods for contract settlement a. job verification

Note that the above methods not an exhaustive list. These are to be considered as suggestions. The developer implementing the orchestrator functionality is free to make modifications as necessary.

Data types

proposed dms.orchestrator.Actor: Actor has a identifier and a mailbox to send/receive messages.

proposed dms.orchestrator.Bid: Consists of information sent by the compute provider node to the requestor node as a bid for the job broadcasted to the network.

proposed dms.orchestrator.BidRequest: A bid request is a message sent by a node to the network to request for bids.

proposed dms.orchestrator.PriceBid: Contains price related information of the bid.

proposed dms.orchestrator.TimeBid: Contains time related information of the bid.

proposed dms.orchestrator.CapabilityComparator: Preferences of the node which has an influence on the comparison operation.

TBD

proposed dms.orchestrator.CapabilityComparison: Result of the comparison operation.

TBD

proposed dms.orchestrator.Invocation: An invocation is a message sent by the orchestrator to the node that accepted the job. It contains the job details and the contract.

proposed dms.orchestrator.Mailbox: A mailbox is a communication channel between two actors. It uses network package functionality to send and receive messages.

proposed Other data types

Data types related to allocation, contract settlement, job updates etc are currently omitted. These should be added as applicable while implementation.

References

Orchestration steps research blogs

The orchestrator functionality of DMS is being developed based on the research done in the following blogs:

graph

Last updated: 2025-01-22 01:10:56.864290 File source:

graph

This whole package is proposed status and therefore documentation is missing, save for the proposed functionality part.

Specification

Description

TBD

Structure and organisation

TBD

Class Diagram

Source

Rendered from source file

Functionality

TBD

Types

TBD

Testing

TBD

Proposed Functionality / Requirements

List of issues

All issues that are filed in GitLab related to the implementation of dms/orchestrator package can be found below. These include any proposals for modifications to the package or new functionality needed to cover the requirements of other packages.

Proposed functionalities

TBD

Data types

proposed LocalNetworkTopology more complex deployments may need a data structure, which considers local network topology of a node / dms -- i.e. for reasoning about speed of connection (as well as capabilities) between neighbors.

References

Related research blogs

TBD

resources

Last updated: 2025-01-22 01:10:57.288939 File source: link on GitLab

resources

Description
Structure and Organisation
Class Diagram
Functionality
Data Types
Testing
Proposed Functionality/Requirements
References

Specification

Description

resources deals with resource management for the machine. This includes calculation of available resources for new jobs or bid requests.

Structure and Organisation

Here is quick overview of the contents of this pacakge:

README: Current file which is aimed towards developers who wish to use and modify the DMS functionality.
init: Contains the initialization of the package.
resource_manager: Contains the resource manager which is responsible for managing the resources of dms.
usage_monitor: Contains the implementation of the UsageMonitor interface.
store: Contains the implementation of the store for the resource manager.

All files with *_test.go contains unit tests for the corresponding functionality.

Class Diagram

The class diagram for the resources package is shown below.

Source file

resources Class diagram

Rendered from source file

!$rootUrlGitlab = "https://gitlab.com/nunet/device-management-service/-/raw/main"
!$packageRelativePath = "/dms/resources"
!$packageUrlGitlab = $rootUrlGitlab + $packageRelativePath
 
!include $packageUrlGitlab/specs/class_diagram.puml

Functionality

Manager Interface

The interface methods are explained below.

AllocateResources

signature: AllocateResources(context.Context, ResourceAllocation) error
input: Context
output (error): Error message

AllocateResources allocates the resources to the job.

DeallocateResources

signature: DeallocateResources(context.Context, string) error
input: Context
output (error): Error message

DeallocateResources deallocates the resources from the job.

GetTotalAllocation

signature: GetTotalAllocation() (Resources, error)
input: Context
output: types.Resource
output (error): Error message

GetTotalAllocation returns the total resources allocated to the jobs.

GetFreeResources

signature: GetFreeResources() (FreeResources, error)
input: None
output: FreeResources
output (error): Error message

GetFreeResources returns the available resources in the allocation pool.

GetOnboardedResources

signature: GetOnboardedResources(context.Context) (OnboardedResources, error)
input: Context
output: OnboardedResources
output (error): Error message

GetOnboardedResources returns the resources onboarded to dms.

UpdateOnboardedResources

signature: UpdateOnboardedResources(context.Context, OnboardedResources) error
input: Context
input: OnboardedResources
output (error): Error message

UpdateOnboardedResources updates the resources onboarded to dms.

UsageMonitor

signature: UsageMonitor() types.UsageMonitor
input: None
output: types.UsageMonitor instance
output (error): None

UsageMonitor returns the types.UsageMonitor instance.

UsageMonitor Interface

This interface defines methods to monitor the system usage. The methods are explained below.

GetUsage

signature: GetUsage(context.Context) (types.Resource, error)
input: Context
output: types.Resource
output (error): Error message

GetUsage returns the resources currently used by the machine.

Data Types

types.Resources: resources defined for the machine.
types.AvailableResources: resources onboarded to Nunet.
types.FreeResources: resources currently available for new jobs.
types.ResourceAllocation: resources allocated to a job.
types.MachineResources: resources available on the machine.
types.GPUVendor: GPU vendors available on the machine.
types.GPU: GPU details.
types.GPUs: A slice of GPU.
types.CPU: CPU details.
types.RAM: RAM details.
types.Disk: Disk details.
types.NetworkInfo: Network details.


### Testing

Refer to `*_test.go` files for unit tests of different functionalities.

### Proposed Functionality / Requirements

#### List of issues

All issues that are related to the implementation of `dms` package can be found below. These include any proposals for modifications to the package or new functionality needed to cover the requirements of other packages.

- [dms package implementation](https://gitlab.com/groups/nunet/-/issues/?sort=created_date&state=opened&label_name%5B%5D=collaboration_group_24%3A%3A33&first_page_size=20)

### References

behaviors

Last updated: 2025-01-22 01:10:54.906042 File source:

DMS Capabilities and Behaviors

Node Capabilities

DMS Node Namespace: /dms/node
- Description: Everything related to the management of a DMS node. Any capability that is implied by this namespace will be able to access all the behaviors below it. These should only be allowed to the controller/user of the DMS. Normally, it would be done by anchoring the DID of the controller user to the root anchor of the dms which allows the controller unlimited root access.

For a fine grained control, the following capabilities can be used:

Peer Capabilities

PeerPingBehavior: /dms/node/peers/ping
- Description: Ping a peer to check if it is alive.
PeersListBehavior: /dms/node/peers/list
- Description: List peers visible to the node.
PeerSelfBehavior: /dms/node/peers/self
- Description: Get the peer id and listening address of the node.
PeerDHTBehavior: /dms/node/peers/dht
- Description: Get the peers in DHT of the node along with their DHT parameters.
PeerConnectBehavior: /dms/node/peers/connect
- Description: Connect to a peer.
PeerScoreBehavior: /dms/node/peers/score
- Description: Get the libp2p pubsub peer score of peers

Onboarding Capabilities

OnboardBehavior: /dms/node/onboarding/onboard
- Description: Onboard the node as a compute provider.
OffboardBehavior: /dms/node/onboarding/offboard
- Description: Offboard the node as a compute provider.
OnboardStatusBehavior: /dms/node/onboarding/status
- Description: Get the onboarding status. Whether the node is onboarded or not and errors if any.

Node Deployment Capabilities

NewDeploymentBehavior: /dms/node/deployment/new
- Description: This node behavior is invoked by the controller to start a new deployment on the node. It takes an ensemble config as input and returns the deployment id.
DeploymentListBehavior: /dms/node/deployment/list
- Description: List all the deployments orchestrated by the node.
DeploymentLogsBehavior: /dms/node/deployment/logs
- Description: Get the logs of a particular deployment.
DeploymentStatusBehavior: /dms/node/deployment/status
- Description: Get the status of a deployment.
DeploymentManifestBehavior: /dms/node/deployment/manifest
- Description: Get the manifest of a deployment.
DeploymentShutdownBehavior: /dms/node/deployment/shutdown
- Description: Shutdown a deployment.

Resource Capabilities

ResourcesAllocatedBehavior: /dms/node/resources/allocated
- Description: The behavior returns the amount of resources allocated to Allocations running on the node. Allocated resources should always be less than or equal to the onboarded resources
ResourcesFreeBehavior: /dms/node/resources/free
- Description: The behavior returns the amount of resources that are free to be allocated on the node. Free resources should always be less than or equal to the onboarded resources
ResourcesOnboardedBehavior: /dms/node/resources/onboarded
- Description: The behavior returns the amount of resources the node is onboarded with.
HardwareSpecBehavior: /dms/node/hardware/spec
- Description: The behavior returns the hardware resource specification of the machine.
HardwareUsageBehavior: /dms/node/hardware/usage
- Description: The behavior returns the full resource usage on the machine including usage by other processes.

Logger Capabilities

LoggerConfigBehavior: /dms/node/logger/config
- Description: Configure the logger/observability config of the node.

Deployment Capabilities

During regular use, it's recommended that compute providers delegate the /dms/deployment capability to orchestrators.

BidRequestBehavior: /dms/deployment/request
- Description: The behavior and capability that will need to be invoked by an orchestrator and delegated from a compute provider to the orchestrator. It allows the orchestrator to request a bid from the compute provider for a specific ensemble.
BidReplyBehavior: /dms/deployment/bid
- Description: The behavior and capability that will need to be invoked by a compute provider and delegated from an orchestrator to the compute provider. It allows the compute provider to reply to a bid request from an orchestrator.
CommitDeploymentBehavior: /dms/deployment/commit
- Description: The associated behavior with this capability allows an orchestrator to temporarily commit the resources the provider bid on until full allocation.
AllocationDeploymentBehavior: /dms/deployment/allocate
- Description: The associated behavior with this capability allows an orchestrator to allocate the resources the provider bid on after having committed it temporarily.
RevertDeploymentBehavior: /dms/deployment/revert
- Description: The associated behavior with this capability allows an orchestrator to revert any commit or allocation done during a deployment.

Capability Capabilities

Capability behaviors allow remote nodes to configure capability tokens on the node. The receiver node needs to have delegated the /dms/cap capability to the invoking node.

CapListBehavior: /dms/cap/list
- Description: The behavior and associated capability allow getting a list of all the capabilities another node had. The capability should be delegated to the node that needs to get the list of capabilities.
CapAnchorBehavior: /dms/cap/anchor
- Description: Allows anchoring capability tokens on another node.

Public Capabilities

PublicHelloBehavior: /public/hello
- Description: A public hello behavior where any actor can invoke it on a specific node/actor and get a hello message back if public capability has been granted.
PublicStatusBehavior: /public/status
- Description: Invoking this behavior on a node will cause it to reply with its total resource amount it has on the machine along with an error message if any.
BroadcastHelloBehavior: /broadcast/hello
- Description: A public hello broadcast in which any actor/node that receives it will reply with a hello message along with its DID.

Allocation Capabilities

AllocationStartBehavior: /dms/allocation/start
- Description: Start an allocation after a deployment.
AllocationRestartBehavior: /dms/allocation/restart
- Description: Restart an allocation after a deployment has been started.
RegisterHealthcheckBehavior: /dms/actor/healthcheck/register
- Description: Register a new healthcheck mechanism for an allocation.

Subnet Capabilities

SubnetAddPeerBehavior: /dms/allocation/subnet/add-peer
- Description: Add a peer to a subnet.
SubnetRemovePeerBehavior: /dms/allocation/subnet/remove-peer
- Description: Remove a peer from a subnet.
SubnetAcceptPeerBehavior: /dms/allocation/subnet/accept-peer
- Description: Accept a peer in a subnet.
SubnetMapPortBehavior: /dms/allocation/subnet/map-port
- Description: Map a port in a subnet. The mapping will be between the subnet ip and the port on the executor.
SubnetUnmapPortBehavior: /dms/allocation/subnet/unmap-port
- Description: Unmap a port in a subnet.
SubnetDNSAddRecordsBehavior: /dms/allocation/subnet/dns/add-records
- Description: Add DNS records to a subnet. Normally these records identify the allocations within the subnet. Each Allocation can have a dns_name parameter that can be used to identify the allocation but if not provided, the allocation name will be used instead. DNS names have a .internal suffix but can be used without them since the resolver within the executor will add it automatically if it supports it.
SubnetDNSRemoveRecordBehavior: /dms/allocation/subnet/dns/remove-record
- Description: Remove a DNS record from a subnet.

Ensemble Capabilities

EnsembleNamespace: /dms/ensemble/%s
- Description: A dynamic namespace that allows the controller to interact with ensembles on the node. The %s will be replaced by the ensemble id once the deployment is running.
AllocationLogsBehavior: /dms/ensemble/%s/allocation/logs
- Description: Get the logs of an allocation in an ensemble.
AllocationShutdownBehavior: /dms/ensemble/%s/allocation/shutdown
- Description: Shutdown an allocation in an ensemble.
SubnetCreateBehavior:
- DynamicTemplate: /dms/ensemble/%s/node/subnet/create
- Static: /dms/node/subnet/create
- Description: Create a new subnet for an ensemble. This request is supposed to be received by the node of the compute provider and created for the allocations it creates for the ensemble.
SubnetDestroyBehavior:
- DynamicTemplate: /dms/ensemble/%s/node/subnet/destroy
- Static: /dms/node/subnet/destroy
- Description: Destroy a subnet for an ensemble. This request is supposed to be received by the node of the compute provider.

jobs

Last updated: 2025-01-22 01:10:55.452245 File source: link on GitLab

Computing in the NuNet Network

Table of Contents

Computing in the NuNet Network
- Ensemble Orchestration
- Deploying in the NuNet Network

Ensemble Orchestration

In NuNet, compute workloads are structured as compute ensembles. Here, we discuss how an ensemble can be created, deployed, and supervised in the NuNet network.

Compute Ensembles

Allocation and Node names within an ensemble must be unique. The ensemble as a whole has a globally unique ID (a randomn UUID).

Ensemble Specification

Fundamentally the ensemble configuration has the following structure:

A map of allocations, mapping allocation names to configuration for individual allocations.
A map of nodes, mapping node names to configuration for individual nodes.
A list of edges between nodes, encoding specific logical edge constraints.
There are additional fields in the data structure which allows us to include ssh keys and scripts in the configuration, as well as supervision strategies policies.

An allocation's configuration has the following structure:

The name of the allocation executor; this is the environment in which the actual compute job is executed. We currently support docker and firecracker VMs, but we plan to also support WASM and generally any sandbox/VM that makes sense for users.
The resources required to run the allocation, such as memory, cpu cores, gpus, and so on.
The execution details, which encodes the executor specific configuration of the allocation.
The DNS name for internal name resolution of the allocation. This can be omitted, in which case the allocation's name becomes the DNS name.
The list of ssh keys ton drop in the allocation, so that administrators can ssh into the allocation.
The list of scripts to execute during provisioning, in execution order.
Finally, the user can also specify the application specific health check to be performede by the supervisor, so that the health of the application can be ascertained and failures detected.

A node's configuration has the following structure:

The list of allocations that are assigned to the node
The configuration of mapping public ports to ports in allocations
The Location constraints for the node
An optional field for explicitly specifying the peer on which the node should be assigned, allowing users and organizations to bring their own nodes into the mix, for instance for hosting sensitive data.

Ensemble Constraints

In DMS v0.5 we support the following constraints:

Resources for an allocation, such as memory, core count, gpu details, and so on.
Location for nodes; the user can specify the region, city, etc all the way to choosing a particular ISP. Location constraints can also be negative, so that a node will not be deployed in certain locations e.g. because of regulatory considerations such as GPDR.
Edge Constraints, which specify the relationship between nodes in the allocation in terms of available bandwidth and round trip time.

Ensemble Deployment

Ensemble Supervision

TODO

Deploying in the NuNet Network

In order to discuss authorization flow for deployment in the NuNet network, we need to distinguish certain actors in the system in the course of an ensembles lifetime.

Specifically, we introduce the following notation:

Let's call U, the user as an actor.
Let's call O the orchestrator, which is an actor living inside a DMS instance (node) for which the user is authorized to initiate a deployment. We call the node where the orchestrator runs N_o. Note that the DID of the orchestrator actor will be the same as the DID of the node on which it runs, but it will have an ephemeral actor ID.
Let's call P_i the set of compute providers that are willing to accept deployment requests from U.
Let's call N_{P_i,j} the DMS nodes controlled by the providers that are willing to accept deployments from users.
And finally let's call A_i the allocation actor for each running allocation. The DID of each allocation actor will be the same as the DID of the node on which the allocation is running, but it will have an ephemeral actor ID.

Also note that we have certain identifiers pertaining to these actors; let's define the following notation:

DID(x) is the DID of actor x; in general this is the DID that identifies the node on which the actor is running.
ID(x) is the ID of actor x; this is generally ephemeral, except for node root actors which have persistent identities matching their DID.
Peer(x) is the peer ID of a node/actor x.
Root(x) is the DID of the root anchor of trust for the node/actor x.

Behaviors and Capabilities

Using the notation above we can enumerate the behavior namespaces and requisite capabilities for deployment of an ensemble:

Invocations from U to N_o are in the /dms/node/deployment namespace
Invocations from O to N_{P_i,j} for deployment bids:
- broadcast /dms/deployment/request via the /nunet/deployment topic
- unicast /dms/deployment/request for pinned ensemble nodes
Messages from N_{P_i,j} to O:
- /dms/deployment/bid as the reply to a bid request
Invocations from O to N_{P_i,j} for deployment control are in the /dms/deployment namespace.
Invocations from O to A_i are in the /dms/allocation namespace and are dynamically granted programmatically.
Invocations from O to N_{P_i,j} for allocation control are in the dynamic /dms/ensemble/<ensemble-id> namespace and are dynamically granted programatically.

This creates the following structure:

U must be authorized with /dms/node/deployment capability in N_o
N_o must be authorized with /dms/deployment capability in N_{P_i,j} so that the orchestrator can make the appropriate invocations.
N_{P_i,j} must be authorized with /dms/deployment/bid capability on N_o so that it can submit bids to the orchestrator.

Note that the decentralized structure and fine grained capability model of the NuActor system allows for very tight access control. This ensures that:

Orchestrators can only run on DMS instances where the user is authorized to initiate deployment.
Bid requests will only be accepted by provider DMS instances where the user is authorized to deploy.
Bids will only be accepted by provider DMS instances whom the user has authorized.

In the following we examine common functional scenarios on how to set up the system so that deployments are properly authorized.

Deploying in a Private Network

TODO

Authorizing a Third Party to Vet Users

TODO

Distributing and Revoking Capability Tokens

TODO

Public Deployment

TODO

orchestrator

Last updated: 2025-01-22 01:10:56.606238 File source:

orchestrator

Specification

Description

The orchestrator is responsible for job scheduling and management (manages jobs on other DMSs).

The default setting is to use pull based orchestration, which is developed in the present proposed specification.

proposed Job Orchestration

Job Posting: The user posts a job request to the DMS. The job request is validated and a Nunet job is created in the DMS.
Search and Match:
a. The Service provider DMS requests for bids from other nodes in the network.
b. DMS on compute provider compares the capability of the available resources against job requirements. If all the requirements are met, it then decides whether to submit a bid.
c. The received bids are assessed and the best bid is selected.
Job Request: In case the shortlisted compute provider has not locked the resources while submitting the bid, the job request workflow is executed. This requires the compute provider DMS to lock the necessary resources required for the job and re-submit the bid. Note that at this stage compute provider can still decline the job request.
Contract Closure: The service provider and the shortlisted compute provider verify that the counterparty is a verified entity and approved by Nunet Solutions to participate in the network. This in an important step to establish trust before any work is performed.
If job does not require any payment (Volunteer Compute), contract is generated by both Service Provider and Compute Provider DMS. This is then verified by Contract-Database. Otherwise, proof of contract needs to be received from the Contract-Database before start of work.
Invocation and Allocation: When the contract closure workflow is completed, both the service provider and compute provider DMS have an agreement and proof of contract with them. Then the service provider DMS will send an invocation to the compute provider DMS which results in job allocation being created. Allocation can be understood as an execution space / environment on actual hardware that enables a Job to be executed.
Job Execution: Once allocation is created, the job execution starts on the compute provider machine.
Contract Settlement: After job is completed, service provider DMS verifies the work done. If the work is correct, the Contract-Database makes the necessary transactions to settle the the contract.

See section for research blogs with more details on this topic.

Structure and Organisation

Here is quick overview of the contents of this directory:

: Current file which is aimed towards developers who wish to use and modify the orchestrator functionality.
: Directory containing package specifications, including package class diagram.

Subpackages

: Defines and implements interfaces of Graph logic for network topology awareness (proposed).

Class Diagram

Source

Rendered from source file

!$rootUrlGitlab = "https://gitlab.com/nunet/device-management-service/-/raw/main"
!$packageRelativePath = "/dms/orchestrator"
!$packageUrlGitlab = $rootUrlGitlab + $packageRelativePath
 
!include $packageUrlGitlab/specs/class_diagram.puml

Functionality

TBD

Note: the functionality of DMS is being currently developed. See the section for the suggested design of interfaces and methods.

Data Types

TBD

Note: the functionality of DMS is being currently developed. See the section for the suggested data types.

Testing

TBD

Proposed Functionality / Requirements

List of issues

Interfaces & Methods

proposed Orchestrator interface

type Orchestrator_interface interface {

	publishBidRequest(dms.node.Node, dms.orchestrator.BidRequest, dms.node.NodeID, ..String)

	compareCapability(dms.Capability, dms.Capability) dms.orchestrator.CapabilityComparison
	
    acceptJob(dms.orchestrator.CapabilityComparison, dms.orchestrator.CapabilityComparator) bool
	
    sendBid(dms.node.Node, dms.orchestrator.BidRequest)

    selectBestBid(dms.node.Node, dms.orchestrator.BidRequest) dms.orchestrator.Bid

    sendJobRequest(dms.node.Node, dms.jobs.Pod, dms.node.NodeID)

    sendInvocation(dms.node.Node, dms.jobs.Invocation, dms.node.NodeId)
    
	orchestrateJob(dms.node.Node, dms.jobs.Job)

	// considering that for workflows involving more than one job connected
	// via different levels of connections between nodes
	// an orchestrator needs to be able to calculate network configurations
	// that involve estimation or connections between candidate nodes
	// in the whole workflow. Two next methods are proposed for that purpose.
	
    // takes BidRequest and all bids received as a result of it
	// and calculates all possible network configurations for it
	// and outputting graph structures having all relevant information
	// MOST PROBABLY WILL NOT IMPLEMENT IN dms 0.5.x 
	constructConfigurations(dms.orchestrator.BidRequest, []dms.orchestrator.Bid) []NetworkConfiguration
	
    // takes a set of Network Configurations and Comparator variable and selects best configuration
	based on the comparator supplied
	// MOST PROBABLY WILL NOT IMPLEMENT IN dms 0.5.x 
	selectBestNetworkConfiguration([]NetworkConfiguration, ConfigurationComparator) NetworkConfigurationComparison
}

publishBidRequest: sends a request for bid to the network for a particular job. This will depend on the network package for propagation of the request to other nodes in the network.

compareCapability: compares two capabilities and returns a CapabilityComparison object. Expected usage is to compare capability required in a job with the available capability of a node.

acceptJob: looks at the comparison between capabilities and preferences of a node in the form of CapabilityComparator object and decides whether to accept a job or not.

sendBid: sends a bid to the node that propagated the BidRequest.

selectBestBid: looks at all the bids received and selects the best one.

sendInvocation: sends an invocation request (as a message) to the node that accepted the job. This message should have all the necessary information to start an Allocation for the job.

proposed Actor interface

type Actor_interface interface {
	// commented as it not shown in the class diagram
	// getMailbox() dms.orchestrator.Mailbox

	sendMessage(message telemetry.Message, ...target any)

    processMessage() telemetry.Message

	// as per actor model, each actor can create another actor and it makes sense to identify this method here 
	// However, this method does not need to be implemented separately 
    // commented as it is not shown in the class diagram
	//createActor()
}

sendMessage: sends a message to another actor (Node / Allocation).

processMessage: processes the message received and decides on what action to take.

proposed Mailbox interface

type Mailbox_interface interface {
	// in order for this to work, we need to have message in the format of telemetry.Message from 
	// coming from the network by implementing 'processMessage()' methods network.P2P and network.Gossipsub  
	// alternatively, we can implement `receiveMessage(payload any)` method here and process both in mailbox interface
	// providing here the proposal for the second option
	
	receiveMessage(payload any) telemetry.Message
	
    handleMessage(telemetry.Message)

	triggerBehavior()

    getKnownTopics()

    getSubscribedTopics()

    subscribeToTopic()

    unsubscribeFromTopic()
}

receiveMessage: receives a message from another Node and converts it into a telemetry.Message object.

handleMessage: processes the message received.

triggerBehavior: this is where actions taken by the actor based on the message received will be defined.

getKnownTopics: retrieves the gossip sub topics known to the node.

getSubscribedTopics: retrieves the gossip sub topics subscribed by the node.

subscribeToTopic: subscribes to a gossip sub topic.

unsubscribeFromTopic: un-subscribes from a gossip sub topic.

proposed Other methods

Methods for job request functionality a. check whether resources are locked b. lock resources c. accept job request
Methods for contract closure a. validate other node as a registered entity b. generate contract c. kyc validation
Methods for job exeuction a. handle job updates
Methods for contract settlement a. job verification

Note that the above methods not an exhaustive list. These are to be considered as suggestions. The developer implementing the orchestrator functionality is free to make modifications as necessary.

Data types

proposed dms.orchestrator.Actor: Actor has a identifier and a mailbox to send/receive messages.

type Actor struct {
	id types.ID
	mailbox dms.orchestrator.Mailbox
}

proposed dms.orchestrator.Bid: Consists of information sent by the compute provider node to the requestor node as a bid for the job broadcasted to the network.

// Bid represents a bid made by the compute provider DMS
type Bid struct {
	// BidRequest is the request for bid against which this bid is made
    BidRequest                  dms.orchestrator.BidRequest
	
    // JobID is ID of the job
    JobID                       int 
	
    // Bidder is the identifier of the node sending the bid
    Bidder                      dms.node.nodeID
	
    // PriceBid is the price information of the bid
    PriceBid                    dms.orchestrator.PriceBid 
	
    // TimeBid is the time information of the bid
    TimeBid                     dms.orchestrator.TimeBid  
	
    // Timeout is the timestamp until which compute provider will be waiting for the job request
    Timeout                     int64    

    // ValidOffer is a flag to indicate whether the bid offer is valid
	ValidOffer                  bool     
	
    // ResourcesLockedUntilTimeout is whether the resources are locked until timeout
    ResourcesLockedUntilTimeout     bool     
}

proposed dms.orchestrator.BidRequest: A bid request is a message sent by a node to the network to request for bids.

type BidRequest struct {

    // id is unique identifier for this bid request
	ID types.ID
	
	// request for bids are done with Pods, as pods combine capacities, required from a single machine
	Pod dms.jobs.Pod

	// a requestor could be dms or an allocation; both types are accepted;
	Requestor types.ID

	ResourceRequirements dms.resource.ResourceRequirements

	// comparator specifies the type of constraints and preferences to be used while making a bid
	Comparator dms.orchestrator.CapabilityComparator
}

proposed dms.orchestrator.PriceBid: Contains price related information of the bid.

// PriceBid represents the pricing parameters of the bid
type PriceBid struct {
	priceBidType string // perResult or perTimeUnit
	currency     string
	perTimeUnit  string  // if the bid is per time unit, what is unit of time?
	perResult    bool    // if the bid is for the result of computation this is true, default is false
	amount       float64 // the amount in selected currency
}

proposed dms.orchestrator.TimeBid: Contains time related information of the bid.

// TimeBid represents time parameters of the bid
type TimeBid struct {
	timeBidType string // duration or fixed time (or other?)
	units       string // units of time
	duration    int
	timeStart   int // or timestamp or whatever
	timeFinish  int // or timestamp or unix time
}

proposed dms.orchestrator.CapabilityComparator: Preferences of the node which has an influence on the comparison operation.

TBD

proposed dms.orchestrator.CapabilityComparison: Result of the comparison operation.

TBD

proposed dms.orchestrator.Invocation: An invocation is a message sent by the orchestrator to the node that accepted the job. It contains the job details and the contract.

type Invocation struct {
	// Pod is a cluster of jobs which are to be run on the same machine
	pod dms.jobs.Pod

	// Contract contains jobID, IDs of both DMS and proof of contract
	contract      tokenomics.Contract
	source        types.ID // since an invocation can in principe be done by any Actor (node or an allocation)
}

proposed dms.orchestrator.Mailbox: A mailbox is a communication channel between two actors. It uses network package functionality to send and receive messages.

type Mailbox struct {
	// Access to NuNet p2p network implemented via the network package
	// Whether we need these or not depends on how we implement the send/receive Message methods here;
	// it would be better not to implement them here
	// for proper architectural separation;
	p2p network.P2P	
	gossipsub network.Gossipsub // Access to NuNet gossip network -- same comment as above
}

proposed Other data types

Data types related to allocation, contract settlement, job updates etc are currently omitted. These should be added as applicable while implementation.

References

Orchestration steps research blogs

The orchestrator functionality of DMS is being developed based on the research done in the following blogs:

dms

dms

Table of Contents

Specification

Description

Structure and Organisation

Class Diagram

Functionality

Data Types

Testing

Proposed Functionality / Requirements

References

behaviors

DMS Capabilities and Behaviors

Node Capabilities

Peer Capabilities

Onboarding Capabilities

Node Deployment Capabilities

Resource Capabilities

Logger Capabilities

Deployment Capabilities

Capability Capabilities

Public Capabilities

Allocation Capabilities

Subnet Capabilities

Ensemble Capabilities

hardware

dms

Table of Contents

Specification

Description

Structure and Organisation

Functionality

Data Types

Testing

References

jobs

Computing in the NuNet Network

Ensemble Orchestration

Compute Ensembles

Ensemble Specification

Ensemble Constraints

Ensemble Deployment

Ensemble Supervision

Deploying in the NuNet Network

Behaviors and Capabilities

Deploying in a Private Network

Authorizing a Third Party to Vet Users

Distributing and Revoking Capability Tokens

Public Deployment

node

node

Table of Contents

Specification

proposed Description

Structure and Organisation

Class Diagram

Functionality

Data Types

Testing

Proposed Functionality / Requirements

References

onboarding

onboarding

Table of Contents

Specification

Description

Structure and Organisation

Class Diagram

Functionality

Offboard

IsOnboarded

Info

Data Types

Testing

Proposed Functionality / Requirements

References

orchestrator

orchestrator

Table of Contents

`proposed` Description