Decentralized GPU ML Cloud

For up-to-date development updates please view our Project Development Blogs or follow our journey on GitLab.

See project scoping discussion with external stakeholders and full description as a Cardano Catalyst Fund8 proposal on the Catalyst platform

Problem Statement

Artificial Intelligence (Machine Learning) models need GPU processing power. How to provide such decentralized GPU power to grow Cardano. Applications running on Cardano, as well as SPOs, need computing power in the form of CPUs or GPUs. Currently, there are only options to have cloud computing rented from big tech, which increases the reliance on such big tech companies or requires purchasing costly hardware setups. In the increasingly hostile and censorship-prone environment, it is essential to secure the reliability and decentralization of Cardano.

Computing needs in the Cardano ecosystem can broadly be divided into:

1. CPU requirements - Stake Pool Operators

2. GPU requirements - Artificial Intelligence (Machine Learning), Dapps, Metaverse, others.

Allowing decentralized computing on CPUs is a prerequisite for running Cardano Nodes via NuNet, a project which already been awarded funding from Cardano Catalyst Fund7 as one of the top 20 voted proposals.

Source: Phase 1: Foundation - One User Per GPU Model

This model will involve getting the NuNet containers to support GPU access, monitor resource usage of GPUs, and make them directly available to the processes running inside the containers. The GPUs utilized in this model initially will be the GPUs available on that specific provider device.

This model has its use cases and would be able to allow ML model training and inference if the available GPU is adequately capable to handle the workload by itself. Additionally, it would serve as a guide for the next phases of development by allowing the core development to be performed which involves supporting GPU device onboarding to NuNet, enabling NuNet Adapter to manage GPUs, implementing of GPU access from within virtual machines and containers, and monitoring GPU resource usage for provider compensation.

Regular personal computers are known not to have enough GPU capacity for large workloads and thus this model will be limited in its ability to allow large-scale ML projects and especially federated learning where data should not be transmitted to the device where the GPU is located. A model where data storage and device with GPU for training are decoupled is necessary to allow users to not upload data to a Provider's device in order to perform the training. It should be possible to allow only certain tasks and processes that need GPU execution to be relayed to the Provider's devices without having to transmit full training data i.e. process being transmitted instead of code and data.

Phase 1 is the proposal and scope for Cardano Catalyst Fund8 (present proposal).


Last updated