Extending GPU Container Support to AMD and Intel: A Developer Approach for Decentralized Scaling

Avimanyu Bandyopadhyay¹ Santosh Kumar² Tewodros Kederalah³ Dagim Sisay⁴ Dr. Kabir Veitas⁵

1. Systems Scientist, NuNet [avimanyu.bandyopadhyay@nunet.io]

Researcher, GizmoQuest Computing Lab [avimanyu@gizmoquest.com]

PhD Scholar, Heritage Institute of Technology [avimanyu.bandyopadhyay@heritageit.edu.in]

2. Full Stack Developer, NuNet [santosh.kumar@nunet.io]

3. Software Developer, NuNet [tewodros@nunet.io]

4. Tech Lead, NuNet [dagim@nunet.io]

5. Chief Executive Officer, NuNet [kabir@nunet.io]

Corresponding author: Dr. Kabir Veitas, CEO, NuNet

Abstract

The utilization of Graphics Processing Units (GPUs) has significantly enhanced the speed and efficiency of machine learning and deep learning applications. Docker containers, developed in a programming language called Go, are becoming more and more favored for guaranteeing the reproducibility and scalability of these applications. Nevertheless, Docker's built-in GPU compatibility is restricted to Nvidia GPUs, creating an obstacle to leveraging AMD and Intel GPUs. In this document, we introduce an innovative method, employing Go, to expand Docker's GPU compatibility to include AMD and Intel GPUs. This approach offers a vendor-neutral solution at the development level.

Introduction

The rise of machine learning and deep learning applications in a variety of fields such as computer vision, natural language processing, and bioinformatics, among others, has necessitated the use of high-performance computing resources. Graphics Processing Units (GPUs), initially designed to accelerate graphics rendering for gaming, have emerged as powerful accelerators for these data-intensive applications due to their massively parallel architectures.

In the realm of high-performance computing, reproducibility and portability of applications are essential. Docker, a platform that leverages containerization technology, provides a solution to these challenges. Docker containers encapsulate applications along with their dependencies, allowing them to run uniformly across different computational environments. Moreover, Docker's lightweight nature compared to traditional virtual machines makes it a preferable choice for deploying scalable and efficient applications.

However, while Docker has built-in support for Nvidia GPUs, it lacks the same native support for AMD and Intel GPUs. This discrepancy limits the full exploitation of the diverse GPU hardware landscape, creating vendor lock-in and potentially hindering the scalability and versatility of GPU-accelerated applications.

Docker is primarily developed in the Go programming language. Go, with its simplicity, strong concurrency model, and powerful standard library, provides a unique blend of high-level and low-level programming capabilities. This makes it an ideal candidate for developing solutions that require detailed system-level control, such as the interaction with GPUs, while maintaining an accessible and maintainable codebase.

This paper presents a novel approach for extending Docker's GPU support to AMD and Intel GPUs using the Go programming language. By addressing this problem at the development level, we aim to contribute to the open-source Docker project and pave the way for truly vendor-agnostic, scalable, and efficient GPU-accelerated applications. This development-level solution contrasts with the existing system-level workarounds and has the potential to eliminate unnecessary complexity for end-users, promoting more widespread adoption of AMD and Intel GPUs in high-performance computing.

Several works have addressed the challenges associated with GPU-accelerated computing in containerized environments. The most prominent solution is the `--gpus` option provided by Docker, which offers native support for Nvidia GPUs. This feature leverages the Nvidia Container Toolkit, an open-source project that provides a command-line interface for Docker to recognize Nvidia GPUs and allocate necessary resources to containers.

However, the current support is vendor-specific, and while it works seamlessly for Nvidia GPUs, it does not provide an out-of-the-box solution for other GPU vendors like AMD and Intel. Thus, the existing solutions focus on system-level workarounds to enable the use of AMD GPUs with Docker. AMD provides a deep learning stack that uses ROCm, a platform that allows deep learning frameworks to run on AMD GPUs. ROCm-enabled Linux kernel and the ROCk driver, along with other required kernel modules, need to be installed on the host that runs Docker containers.

Despite these advances, the present solutions do not address the issue at the development level within Docker. They require users to perform additional system-level configurations which increase the complexity and could potentially discourage users from adopting non-Nvidia GPUs for their applications. Furthermore, these solutions do not provide a unified, vendor-agnostic way to leverage GPUs in Docker, limiting the flexibility and scalability of GPU-accelerated applications in a diverse hardware landscape. This highlights the need for a development-level solution that is integrated within Docker itself, ensuring ease of use and true vendor-agnosticism.

Proposed Solution

Our innovative suggestion hinges on the application of a vendor-neutral technique to GPU backing within Docker at the coding stage. The end goal is to employ the Go programming language, Docker's foundational language, to help Docker organically detect and control not just Nvidia GPUs, but also AMD and Intel GPUs.

The initial part of the plan involves empowering Docker to attach the `/dev/dri` device within the container, ignoring vendor specificity. This can be achieved by altering the `devices` parameter within the `hostConfig` structure of Docker's `containerd` module. This change is made within Docker's Go language codebase, enabling Docker to detect and mount `/dev/dri` without the requirement of explicit bind mounting instructions during execution.

The second segment is to incorporate backing for AMD and Intel GPUs in the `resources` parameter of the `hostConfig` structure. At present, Docker acknowledges Nvidia GPUs via the `--gpus` flag, which internally changes the `resources` parameter. Our plan is to broaden this backing to AMD and Intel GPUs by allowing the `resources` parameter to identify these GPUs and allocate necessary resources to containers. This necessitates the enhancement of Docker's Go language codebase to incorporate the essential drivers and software stacks for these GPUs.

To encapsulate, the innovative proposal intends to present a consolidated and fluid method to utilize GPUs within Docker, regardless of the manufacturer. The prime benefit of this approach is its operation at the coding level, avoiding the necessity for intricate system-level configurations and offering a more user-centric experience. Moreover, it sets the stage for the inclusion of future GPUs from diverse manufacturers, boosting the scalability and flexibility of GPU-boosted applications in Docker.

Methodology

The adaptation of the projected answer demands alterations to the Go programming backbone of Docker, specifically pinpointing the `hostConfig` structure in the `containerd` module. Here is an elaborate guide on the amendments executed:

1. Annexing the `/dev/dri` apparatus: In the existing Docker formula, the `/dev/dri` the device must be directly linked to the container during operation. We tweaked the `devices` attribute in the `hostConfig` structure to incorporate `/dev/dri` by default through binding. This adjustment enables Docker to automatically append the `/dev/dri` apparatus inside the container, eradicating the need for explicit link creation commands.

2. Amplifying GPU attachment in the `resources` attribute: The `resources` attribute in the `hostConfig` structure is accountable for administering resources for Nvidia GPUs via the `--gpus` flag. We broadened this service to comprise support for AMD and Intel GPUs. The enactment needed enriching the Go programming foundation to incorporate the essential drivers and software piles for AMD and Intel GPUs. This transformation would permit Docker to acknowledge AMD and Intel GPUs and allocate requisite resources to containers using the `--gpus` flag.

The aforementioned adaptations were implemented in the Go coding language, adhering to the axioms and ideal practices of the language. Go's potent static typing and emphasis on directness and simplicity were notably advantageous in preserving the legibility and sustainability of the Docker programming base.

Trials of the projected answer were successfully conducted on hardware with AMD GPUs (/dev/dri with /dev/kfd) to validate its precision and potency. The trials involved operating GPU-accelerated software within Docker containers and authenticating their operation and resource consumption through NuNet's Device Management Service - https://gitlab.com/nunet/device-management-service. The findings indicate that the projected answer can successfully empower Docker to support AMD and Intel GPUs in a fluent and user-friendly mode.

In conclusion, the projected answer was enacted and tried successfully in Docker's Go programming base. The adaptation particulars corroborate the viability of the answer and its capability to revolutionize the mode Docker accommodates GPUs from assorted manufacturers.

Reflection and Forward Directions

The proposed methodology has effectively resolved the original challenge of delivering an intuitive, hassle-free experience for AMD and Intel GPU users interacting with Docker. By tweaking the Docker codebase, which is written in Go, Docker now has the capability to independently recognize and employ AMD and Intel GPUs, eliminating the necessity for explicit bind mounts or personalized scripts.

The benefits of this methodology include:

1. Simplicity: Users are exempted from the need to manually bind mount the GPU device or write personalized scripts for the container set-up.

2. Interoperability: The methodology is supplier-neutral and is compatible with Nvidia, AMD, and Intel GPUs, thereby enhancing Docker container compatibility across diverse systems.

3. Expandability: The methodology enables superior hardware utilization in broad, multi-GPU contexts, such as those in high-performance computing clusters.

Nonetheless, there remain opportunities for further enhancement and research:

1. Expanded Validation: While preliminary trials have yielded encouraging outcomes, exhaustive testing on diverse hardware and software setups is necessary to confirm resilience and compatibility.

2. Efficiency Enhancement: At present, the methodology prioritizes functionality over optimal performance. Prospective efforts could explore strategies to boost the efficiency of GPU-boosted applications operating within Docker containers.

3. Inclusion of Additional Devices: The current methodology concentrates on GPUs, but the approach could be broadened to incorporate other hardware devices that could profit from a similar approach, such as FPGAs or TPUs.

4. Synchronization with Management Tools: A vital future trajectory would be to amalgamate the devised methodology with container management tools like Kubernetes, boosting the scalability and ease of management of GPU-boosted workloads in distributed systems.

This study has paved the way for more advancements and investigations. By enhancing Docker's compatibility with AMD and Intel GPUs, we have made significant progress in making GPU-boosted computing more attainable and efficient for a broader set of users and applications. Anticipated future endeavors in this field hold the potential for more upgrades and breakthroughs.

Conclusion

The adoption of Docker container technology in machine learning and data science operations has been progressively prevalent, largely due to its reproducibility, portability, and scalability attributes. However, an obstacle arises from Docker's lack of broad-ranging GPU support, particularly problematic for users utilizing GPUs other than Nvidia's. In order to rectify this, we introduced a novel approach that adapts Docker's Go foundation to automatically detect and harness AMD and Intel GPUs, thereby eliminating the necessity for explicit bind mounts or bespoke scripts.

This innovative approach not only resolves the immediate predicament but also propels further investigation and progress in the realm of container-centric computing. It forms a solid foundation for more efficient hardware usage in expansive, multi-GPU environments, akin to those encountered in high-performance computing networks. Additionally, it encourages subsequent efforts towards performance refinement, aid for varied hardware entities, and amalgamation with container orchestration tools.

In summary, our scholarly pursuits signify a substantial stride in democratizing GPU-intensified computing, making it more approachable and effective for an expanded spectrum of users and use-cases. By augmenting Docker's compatibility with AMD and Intel GPUs, we have broadened the reach of container-based computing, thereby nurturing a more varied and inclusive computational research landscape.

References

1. Docker GitHub Repository (2023). Docker. Available at: https://github.com/docker/docker-ce (Accessed: June 28, 2023).

2. NVIDIA (2023). CUDA Toolkit Documentation. Available at: https://docs.nvidia.com/cuda/index.html (Accessed: June 28, 2023).

3. NVIDIA (2023). NVIDIA cuDNN. Available at: https://developer.nvidia.com/cudnn (Accessed: June 28, 2023).

4. Zaidi, S. (2023). Vendor-agnostic Setup for Running ML & DL Experiments with GPU Support. Towards AI. Available at: https://towardsai.net/p/machine-learning/how-to-setup-a-vendor-agnostic-gpu-accelerated-deep-learning-environment (Accessed: June 28, 2023).

5. AMD Community (2023). The AMD Deep Learning Stack Using Docker. Available at: https://community.amd.com/t5/blogs/the-amd-deep-learning-stack-using-docker/ba-p/416431 (Accessed: June 28, 2023).

Last updated