ML Code Examples
Here's a list of some example Machine Learning code URLs (briefly explained) that you can submit to NuNet through our Service Provider Dashboard (SPD). You could also deploy and test your own ML code.
This Python code is for training and testing a simple convolutional neural network (CNN) using PyTorch on the CIFAR-10 dataset.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/cifar-10.py
- 1.Check for GPU availability: The code checks if a GPU is available for faster computation. If it is, the code will utilize the GPU.
- 2.Load and preprocess the images: The code retrieves the CIFAR-10 dataset and preprocesses the images, preparing them for the machine learning process.
- 3.Display random images*: The code contains a function to display a few random images from the dataset that will be used for training.
- 4.Define a Convolutional Neural Network (CNN) architecture: The code creates a CNN architecture to guide the machine in learning how to recognize images.
- 5.Prepare the machine for training: The code configures the machine to follow the CNN architecture and determines the approach to learning from mistakes.
- 6.Train the machine: The code trains the machine using the images for 100 iterations (epochs). It keeps track of the machine's performance during training.
- 7.Test the trained machine: After training, the code evaluates how well the machine can identify images it has not seen before, using a test set of images.
- 8.Evaluate the machine's performance: The code calculates the overall accuracy of the machine in identifying the test images, as well as the accuracy for each category of images in the dataset.
However, this code does not include a checkpointing system to save the machine's learning progress. If training is interrupted, the process will have to start from the beginning.
This code is a modified version of the previous one. The main changes are related to adding functionality for checkpointing and resuming training from a saved state.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/cifar-10_checkpointed.py
Here's the explanation:
- 1.Imports and configurations: The code imports necessary libraries and sets up the device for using GPU or CPU, based on availability.
- 2.Data preparation: It loads and preprocesses the CIFAR10 dataset for training and testing purposes.
- 3.Visualizing images*: The code provides a function to display images from the dataset.
- 4.Defining the network architecture: The code defines a Convolutional Neural Network (CNN) with two convolutional layers, two pooling layers, and three fully connected layers.
- 5.Checkpoint and resume functionality: The code reads and writes the epoch count from a text file and saves the model weights in a file. If a checkpoint file exists, it resumes training from the last saved state.
- 6.Training the model: The training loop is modified to include saving the current model state after each epoch, and resuming from the saved state if needed.
- 7.Evaluating the model: The code tests the model's performance on the test dataset, calculating overall accuracy and accuracy per class.
Overall, this version of the code is designed for checkpointing and resuming the training process, making it more convenient when training is interrupted or needs to be paused.
This code sets up a chatbot using the Gradio library for the interface and the T5 large model from Hugging Face's Transformers library as the backbone.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/flan-t5-large_chatbot-multi-gpu.py
Here's an overview of the code:
- 1.Import libraries: Gradio, PyTorch, and Transformers are imported.
- 2.Hyperparameters: The code defines various hyperparameters for the model's text generation process, such as the maximum sequence and output lengths, number of beams for beam search, length penalty, and more.
- 3.Load the model and tokenizer: The pre-trained T5 large model and its corresponding tokenizer are loaded. The model is set up to run on GPU(s) if available.
- 4.Define the chatbot function: The
chatbot()
function takes user input, tokenizes it, feeds it to the T5 model, generates a response, and decodes the output tokens back into text. - 5.Create a Gradio interface: The Gradio library is used to create a simple user interface for interacting with the chatbot. A text input box and a text output box are provided, along with a title and description.
- 6.Launch the Gradio interface: The Gradio interface is launched, and a shareable link is created.
This code sets up a chatbot using the T5 large model, providing an easy-to-use interface for users to ask questions and receive responses.
This code sets up a chatbot that can remember the conversation and save it to a file. It uses the T5 model from the Transformers library and Gradio for a user-friendly interface. The chatbot will work on your computer's GPU if available.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/flan-t5-large_chatbot-multi-gpu_checkpointed.py
Here's what the code does:
- 1.It imports the needed tools (Gradio, PyTorch, os, and Transformers).
- 2.The code sets some rules (hyperparameters) to help the model give better answers.
- 3.The T5 model and tokenizer are loaded to understand and process text. The model will use your computer's GPU if possible, making it work faster.
- 4.The code checks if a file named "conversation.txt" exists. If not, it creates one to save the conversation.
- 5.A chatbot function is created that opens the conversation file, reads the previous conversation, and adds new input and output to the file. It also processes the input and generates a response using the T5 model.
- 6.Using Gradio, a simple chat window is created for you to ask questions and see the answers. The chatbot will remember the conversation and display the updated conversation after each response.
- 7.Finally, the chat window is launched, and you can share it with others if you want.
This code helps you set up a chatbot that can remember and save conversations, making it fun and easy to interact with the T5 model while keeping track of the discussion.
This code is similar to previous two. It sets up a basic chatbot using the T5 large model from the Transformers library and Gradio for a user-friendly interface. The chatbot will run on your computer's GPU if available.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/flan-t5-large_chatbot.py
Here's what the code does:
- 1.It imports the required tools (Gradio, PyTorch, and Transformers).
- 2.The T5 model and tokenizer are loaded, which helps the chatbot understand and process text. The model will use your computer's GPU if possible, making it work faster.
- 3.A chatbot function is created that processes user input by tokenizing it and converting it into a PyTorch tensor. It then generates a response using the T5 model with specified settings.
- 4.The generated response is decoded, and any special tokens are removed before it's returned.
- 5.Using Gradio, a simple chat window is created for users to input text and see the chatbot's response.
- 6.Finally, the chat window is launched, allowing users to interact with the T5 model and see its responses.
This code helps you set up a simple and interactive chatbot, making it easy to use the T5 model to generate responses for user inputs.
This is a similar chatbot like the previous ones above.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/gpt2-large_chatbot.py
Here's what it does:
- This code creates a chatbot using a smart model called GPT-2 and a tool named Gradio that makes it easy to talk to the chatbot.
- The code uses tools from the Transformers library to load the GPT-2 model and set it up.
- A special token is added to help the model understand when a message starts and ends.
- The code has a
chatbot
function that changes your message into a form the model understands, makes the model think of a reply, and then changes the reply back to normal text. - A simple Gradio chatbot interface is made, so you can ask questions and get answers from the chatbot.
Comparing this code with the previous ones, this one uses a model named GPT-2 instead of another model called T5. The way the model is loaded is a little different, but the overall idea of creating a chatbot using Gradio remains the same.
This code uses
nn.DataParallel
to enable the model to be trained on multiple GPUs, thereby accelerating the training process through distributed computing.https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/multi-gpu-test_checkpointed.py
- 1.The code checks whether a GPU is available and displays information about the available devices.
- 2.Defines a PyTorch model to be trained with an example linear function.
- 3.Initializes the model and moves it to GPU(s) if available.
- 4.Defines the loss function and optimizer.
- 5.Generates dummy input data for the model.
- 6.Attempts to read the epoch count from a file, if the file doesn't exist, creates it with a default value of 0.
- 7.Defines two helper functions to save and load the epoch count to/from a file.
- 8.Defines a function to initialize the model's weights.
- 9.Tries to load the model's weights from the last saved checkpoint. If there are no saved checkpoints, initializes the model's weights.
- 10.Defines the total number of epochs to run and the interval for printing the loss.
- 11.Trains the model for a specified number of epochs, printing the loss every few epochs.
- 12.Saves the model's weights and epoch count to a file after each epoch.
Please note that the sole purpose of this code was to test simultaneous execution of the process on multiple GPUs.
This code is a PyTorch implementation of the PaLM model for text generation. The code is written by Phil Wang and licensed under MIT license.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/palm-rlhf.py
It trains and generates text by following the steps:
- The code uses PyTorch and Palm-rlhf-pytorch library to implement PaLM model for training and inference.
- The enwik8 dataset (a subset of Wikipedia), is used for training the model, which is downloaded using curl command and saved locally in the data directory.
- A TextSamplerDataset is defined to create train and validation datasets from the enwik8 dataset.
- A PaLM model is instantiated with hyperparameters and moved to the device (GPU) using accelerator.
- An optimizer is created to optimize the PaLM model using Adam optimization algorithm
- The training is done for a fixed number of batches, where the loss is calculated using PaLM model on training dataset, the gradients are accumulated and backpropagated, and the parameters are updated using the optimizer.
- Validation loss is also calculated every few batches to check the performance of the model on the validation dataset.
- The model is also used for generating text after every few batches.
- After training, the model is used for inference by getting user input, generating text using the trained PaLM model, and printing the output.
Summarizing, this code trains a neural network using the PaLM architecture to generate text similar to a given dataset. The dataset used here is
enwik8
, which is an 100 MB dump of the English Wikipedia. The code defines a PaLM model and uses PyTorch DataLoader to feed the data to the model. It uses an accelerator library to distribute the computation across available devices, such as GPUs. Finally, it allows the user to test the generated model by inputting a sentence, and the model responds with a predicted output.This code is a language model, PaLM, trained on enwik8 dataset. It trains the model on GPU by using PyTorch's DataParallel library, allowing distributed computing on GPUs. Additionally, the code also implements checkpointing which allows resuming from the last checkpoint instead of restarting the training from the beginning.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/pytorch/palm-rlhf_checkpointed.py
Revisiting the points with checkpointing:
- This code uses PyTorch's PaLM model and Accelerator library for distributed computing on GPUs.
- The code downloads the enwik8 dataset and divides it into training and validation sets.
- The code uses TextSamplerDataset to load data into PyTorch DataLoader, which is then used for training.
- It uses the Adam optimizer for training and also employs learning rate scheduler.
- The code implements checkpointing to save model weights and optimizer states at a defined interval and allows resuming from the last checkpoint.
- The code trains the model for a defined number of batches and validates after a defined interval.
- It generates a sample text after a defined interval and also provides an option for the user to enter a prompt for generating text.
- The training and generation logs are displayed using the tqdm library.
Summarizing, this code trains a PaLM language model on the enwik8 dataset using PyTorch and Accelerator libraries. It implements checkpointing to resume training from the last checkpoint and uses PyTorch's DataParallel library for distributed computing on GPUs. The model is trained for a defined number of batches and generates sample text after a defined interval. The user can also enter a prompt for generating text. The training and generation logs are displayed using the tqdm library.
This code is an implementation of image classification using Fashion MNIST dataset with TensorFlow and Keras. The dataset consists of images of clothing items such as shirts, shoes, trousers, etc. The model is trained to classify these images into different categories.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/tensorflow/fashion-mnist.py
Here's how it works:
- 1.The code imports TensorFlow and Keras libraries.
- 2.Fashion MNIST dataset is loaded using Keras.
- 3.The images are shown using matplotlib.
- 4.The images are normalized to 0-1 range.
- 5.A sequential model is created using Keras with two dense layers.
- 6.The model is compiled using adam optimizer and sparse categorical crossentropy loss.
- 7.The model is trained using the training data.
- 8.The test loss and accuracy are evaluated using test data.
- 9.The model is used to predict the labels for the test data.
- 10.Functions are defined to plot the images and the predicted labels.
- 11.Plots are generated to show the predicted labels and true labels for some test images.
- 12.An individual image is selected and its label is predicted using the model.
In summary, this code trains a machine learning model to classify images of clothing from the Fashion-MNIST dataset. The code first loads and preprocesses the data, then builds and trains a sequential neural network model using the TensorFlow library. The trained model is used to make predictions on test data and visualize its performance through plots of images and their corresponding predicted and true labels. Finally, the model is used to predict the class of a single image.
This code trains a neural network classifier on the Fashion-MNIST dataset and uses checkpointing to save and restore the model weights. Checkpointing allows training to be interrupted and resumed without losing progress. The checkpoint is saved after every epoch, and the number of epochs completed before the training was interrupted is recorded in a text file.
https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/-/raw/develop/examples/tensorflow/fashion-mnist_checkpointed.py
Here is a breakdown of the code:
- 1.The necessary libraries are imported.
- 2.The Fashion-MNIST dataset is loaded and preprocessed. The class names are also defined.
- 3.The first image in the training set is displayed using Matplotlib.
- 4.The images in the training set are normalized to values between 0 and 1.
- 5.The first 25 images in the training set are displayed using Matplotlib.
- 6.The neural network model is defined using Keras.
- 7.A checkpoint callback is created to save the weights after each epoch.
- 8.If the checkpoint file exists, the model weights are loaded, and training is resumed. Otherwise, a new checkpoint file is created.
- 9.A custom callback is defined to update the epoch counter in the text file at the end of each epoch.
- 10.The model is trained with the fit() method, using the checkpoint and counter callbacks.
- 11.The model is evaluated on the test set.
- 12.The predictions are computed for the test set.
- 13.Two functions are defined to display the predicted labels and confidence scores for each test image.
- 14.The predicted labels and confidence scores for two test images are displayed using Matplotlib.
- 15.The predicted labels and confidence scores for several test images are displayed using Matplotlib.
- 16.An individual test image is displayed, and its predicted label and confidence score are computed and displayed using Matplotlib.
In summary, this code trains a neural network classifier on the Fashion-MNIST dataset, using checkpointing to save and restore model weights and a custom callback to update the epoch counter. The predicted labels and confidence scores for test images are displayed using Matplotlib.
Introduction: This code trains a convolutional neural network on the CIFAR-10 dataset using PyTorch. It includes loading and preprocessing the dataset, defining the neural network, training the model, and evaluating its performance on the test set.
https://gitlab.com/nunet/ml-on-gpu/ml-on-cpu-service/-/raw/develop/examples/cifar-10_cpu_checkpointed.py
How it works:
- 1.The code imports PyTorch and torchvision modules.
- 2.It still checks if a GPU is available and sets the device accordingly.
- 3.It normalizes the CIFAR-10 dataset using torchvision.transforms.
- 4.It loads the training and test data using torchvision.datasets.CIFAR10 and creates dataloaders for them using torch.utils.data.DataLoader.
- 5.It defines the class names for the CIFAR-10 dataset.
- 6.It defines a function to display an image from the dataset using matplotlib.
- 7.It displays a few random images from the training set and their labels.
- 8.It defines the neural network using nn.Module and initializes its weights using a custom function.
- 9.It checks if a checkpoint file exists and loads the model's weights from it if it does.
- 10.It defines the loss function, optimizer, and the number of epochs to train for.
- 11.It trains the model for the specified number of epochs using the training set and the defined optimizer and loss function.
- 12.It saves the model's weights and the current epoch count to a file after each epoch.
- 13.It evaluates the performance of the model on the test set and prints the accuracy.
- 14.It calculates the accuracy of the model for each class in the dataset and prints it.
Summary: This code trains a convolutional neural network on the CIFAR-10 dataset using PyTorch. It loads and preprocesses the data, defines the neural network, trains the model, and evaluates its performance on the test set. It also saves the model's weights and epoch count to a file after each epoch and calculates the accuracy of the model for each class in the dataset.
The Iris dataset is a classic example in the field of machine learning used for classification tasks. In this code, we will use the scikit-learn CPU-only library to build a Decision Tree Classifier on the Iris dataset. The code will train the classifier on 80% of the data and test it on the remaining 20% of the data. Finally, the code will evaluate the model's performance using accuracy, classification report, and confusion matrix.
https://gitlab.com/nunet/ml-on-gpu/ml-on-cpu-service/-/blob/develop/examples/cpu-ml-test-scikit-learn.py
What it does:
- Load necessary libraries such as numpy, Scikit-learn's load_iris, train_test_split, DecisionTreeClassifier, accuracy_score, classification_report, and confusion_matrix.
- Load the Iris dataset and separate input features (X) and output labels (y).
- Split the dataset into train and test sets (80% training, 20% testing) using train_test_split.
- Create a Decision Tree Classifier and fit it to the training data using DecisionTreeClassifier and fit methods.
- Make predictions on the test set using predict method.
- Evaluate the model's performance using accuracy_score, classification_report, and confusion_matrix methods.
- Print the accuracy of the model on the test set.
- Print the classification report, which shows precision, recall, f1-score, and support for each class.
- Print the confusion matrix, which shows the number of true positives, false positives, true negatives, and false negatives for each class.
So in this code, we used the CPU-only Scikit-learn library to build a Decision Tree Classifier on the Iris dataset. The code split the dataset into 80% training and 20% testing sets, trained the classifier on the training set, and tested it on the test set. Finally, we evaluated the model's performance using accuracy, classification report, and confusion matrix. The accuracy of the model on the test set was printed, and the classification report and confusion matrix were shown to provide additional insights into the model's performance.
This ML code snippet will give an error. So it can be used for testing the workflow to understand how we handle failed jobs:
https://gitlab.com/-/snippets/2523096/raw/main/tensor-shape-pytorch-error.py
When you try to run this program, you will receive the following message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 0
This error occurs because we're trying to perform an operation (in this case, addition) on two tensors that don't have the same shape. In PyTorch, element-wise operations require the tensors to have the same shape or be broadcastable to a common shape. In this case, the shapes
(5, 2)
and (3, 2)
are not compatible because the sizes along the first dimension do not match, and they can't be broadcasted to a common shape.There are of course numerous other types of errors for other ML/Computational code that you might encounter while working with PyTorch (or any ML/Computational library for that matter), including but not limited to:
- 1.
TypeError
: This could happen when you pass arguments of the wrong type to a function. For example, passing a list where a tensor is expected. - 2.
ValueError
: This could occur when you pass arguments of the correct type but incorrect value to a function. For example, passing negative integers to a function that expects positive integers. - 3.
IndexError
: You may encounter this when you try to index a tensor using an invalid index. - 4.
MemoryError
: This occurs when the system runs out of memory, often when trying to allocate very large tensors. - 5.
RuntimeError
: This is a catch-all for various kinds of errors that can occur during execution. The tensor shape mismatch error we discussed earlier is a type ofRuntimeError
.
Here's an example of a code snippet that will give a
TypeError
:import torch
# Create a list
list1 = [1, 2, 3, 4, 5]
# Try to perform a tensor operation on the list
result = torch.tanh(list1)
When you run this program, you'll receive a
TypeError
with the following message:tanh(): argument 'input' (position 1) must be Tensor, not list
The error handling approach varies depending on the type of error. For this
TypeError
, you can handle it by converting the list to a tensor before performing the operation:import torch
# Create a list
list1 = [1, 2, 3, 4, 5]
# Convert the list to a tensor
tensor1 = torch.tensor(list1)
# Perform the tensor operation
result = torch.tanh(tensor1)
From an ML developer's or researcher's perspective, it's always good practice to anticipate potential errors and handle them gracefully in the ML/Computational code.
Last modified 29d ago