submit-a-compute-job

Last updated: 2025-01-22 01:11:30.544393 File source: link on GitLab

Submit a Compute Job

Actors

  • Primary Actor: AI Developer

  • Supporting Actors:

    • Compute Provider

    • Service Provider

    • Result Storage Service

Goal in Context:

The user wants to submit a compute job to a decentralized compute platform, using DID Auth for secure authentication.

Preconditions:

  • The user has a DID and wallet for signing challenges.

  • The Compute Service and Service Providers are operational.

Trigger:

The user submits a compute job request.

Main success scenario

The main success scenario for this use case is described in nunet/device-management-service/dms/jobs package documentation as general nunet platform functionality (see the package documentation). The use-case is straightforward extension of the device-management-service functionality. We shall adapt this scenario to correspond to platform functionality and expose it to external users based on provided functional requirements and scenario from external user perspective.

Originally described scenario follows:

User submits job: The user sends a job request to the Compute Service, including their DID and job details.

  1. Authenticate user:

  2. The Compute Service generates a cryptographic challenge and sends it to the user.

  3. The user signs the challenge with their DID private key and sends the signature back.

  4. The Compute Service verifies the signature using the DID Document.

  1. Allocate resources:

  2. The Task Scheduler allocates resources for the job and tags it with the user’s DID for authorization.

  3. Execute job: The compute job is executed on the allocated resources.

  4. Store results: The Result Storage Service stores the output of the compute job, linked to the user’s DID.

  1. Notify user:

  2. The user is notified that the job has completed, with a link to the results.

Functional Requirements

To be rewritten in Gherkin features as use-case specific regression tests and integrated into test-suite. As noted above, most of these functionalities are implemented as general device-management-service functionality.

  1. Submit a Compute Job: As a user, I want to submit a compute job securely using my DID so that I can process my data on the decentralized platform.

    1. The Compute Service must authenticate the user using DID Auth before accepting the job request.

    2. The Task Scheduler must allocate resources for the job and tag it with the user’s DID.

    3. The system must store job details and associate outputs with the user’s DID.

    4. The user must be notified when the job starts and when it is completed.

  2. Retrieve Compute Results: As a user, I want to retrieve the results of my compute job securely so that I can access the output linked to my DID.

    1. The Result Storage Service must verify the user’s DID ownership before granting access to the job results.

    2. If authentication fails, access to the results must be denied.

    3. The user must receive a notification with a link to the results once they are authenticated.

  3. Cancel Compute Job: As a user, I want to cancel a running compute job securely so that I can stop unnecessary resource usage.

    1. The Compute Service must verify the user’s DID ownership of the job before processing the cancellation request.

    2. The Task Scheduler must terminate the job and release resources upon successful authentication.

    3. The user must receive a confirmation message once the job is canceled.

  4. Insufficient resources:

    1. If resources are unavailable, the Service Provider / Orchestrator queues the job or notifies the user of a delay.

Diagrams

Note: most probably we need to get rid of internal actors (orchestrator, allocation, etc), as these are not relevant from use-case perspective.

Last updated