Using Fox's GPU's

A GPU, or Graphics Processing Unit, is a computational unit, which as the name suggest, is optimized to work on graphics tasks. Nearly every computer device that one interacts with contains a GPU of some sort, responsible for transforming the information we want to display into actual pixels on our screens.

One question that might immediately present itself is, if GPUs are optimized for graphics - why are they interesting in the context of computational resources? The answer to that is of course complicated, but the short explanation is that many computational tasks have a lot in common with graphical computations. The reason for this is that GPUs are optimized for working with pixels on the screen, and a lot of them. Since all of these operations are almost identical, mainly working on floating point values, they can be run in parallel on dedicated hardware that is tailored and optimized for this particular task (i.e. the GPU). Working with a grid of pixles might sound familiar if one is already working with a discrete grid in e.g. atmospheric simulation, which points to the reason why GPUs can be interesting in a computational context.

Since GPUs are optimized for working on grids of data and how to transform this data, they are quite well suited for matrix calculations. For some indication of this we can compare the theoretical performance of one GPU with one CPU.

	AMD Epyc 7552	Nvidia A100
Half Precision	N/A	78 TFLOPS
Single Precision	1,5 TFLOPS	19.5 TFLOPS
Double Precision	N/A	9.7 TFLOPS

Based on this it is no wonder why tensor libraries such as TensorFlow and PyTorch report speedup on accelerators between 23x and 190x compared to using only a CPU.

Getting started

Of the resources provided on Fox, only the accel job type currently has GPUs available. To access these one has to select the correct partition as well as request one or more GPUs to utilize.

To select the correct partition use the --partition=accel flag with either srun or salloc or in your Slurm script. This flag will ensure that your job is only run on machines in the accel partition which have attached GPUs. However, to be able to actually interact with one or more GPUs we will have to also add --gpus=N, which tells Slurm that we would also like to use N GPUs (read more about available flags in the official Slurm documentation). Each accel node in Fox contains four GPUs, i.e. N above can be set to either {1, 2, 3, 4}.

Note
Research groups are already contributing to the growth of Fox and as such Fox already contains nodes with different GPU configurations. To learn how to select different GPUs see our documentation on selecting GPUs in Slurm

Connecting to the cluster

To get started we first have to SSH on Fox:

$ ssh <username>@fox.educloud.no

Interactive testing

All projects should have access to GPU resources, and to that end we will start by simply testing that we can get access to a single GPU. To do this we will run an interactive job, on the accel partition and asking for a single GPU.

$ salloc --account=ec<XX> --ntasks=1 --mem-per-cpu=1G --time=00:05:00 --qos=devel --partition=accel --gpus=1
$ nvidia-smi

The two commands above should result in something like:

Thu Jun 17 08:49:01 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:24:00.0 Off |                    0 |
| N/A   28C    P0    33W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Note
In the above Slurm specification we combined `--qos=devel` with GPUs and interactive operations so that we can experiment with commands interactively. This can be a good way to perform short tests to ensure that libraries correctly pick up GPUs when developing your experiments. Read more about `--qos=devel` in our guide on Interactive jobs.

Slurm script testing

The next thing that we will try to do is to utilize the TensorFlow/2.4.1-fosscuda-2020b library to execute a very simple computation on the GPU. We could do the following interactively in Python, but to introduce Slurm scripts we will now make a quick transition (which can also make it a bit easier since we don��t have to sit and wait for the interactive session to start).

We will use the following simple calculation in Python and Tensorflow to test the GPUs of Fox:

#!/usr/bin/env python3

import tensorflow as tf

# Test if there are any GPUs available
print(f"Num GPUs Available: {len(tf.config.list_physical_devices('GPU'))}")

# Have Tensorflow output where computations are run
tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

# Print result
print(c)

Save the above as gpu_intro.py on Fox.

To run this we will first have to create a Slurm script in which we will request resources. A good place to start is with a basic job script (see Job Scripts). Use the following to create submit_gpu.sh (remember to substitute your project number under --account):

#!/bin/bash
#SBATCH --job-name=TestGPUOnFox
#SBATCH --account=ec<XX>
#SBATCH --time=05:00
#SBATCH --mem-per-cpu=512M
#SBATCH --qos=devel

## Set up job environment:
set -o errexit  # Exit the script on any error
set -o nounset  # Treat any unset variables as an error

module --quiet purge  # Reset the modules to the system default
module load TensorFlow/2.4.1-fosscuda-2020b
module list

python gpu_intro.py

If we just run the above Slurm script with sbatch submit_gpu.sh the output (found in the same directory as you executed the sbatch command with a name like slurm-<job-id>.out) will contain several errors as Tensorflow attempts to communicate with the GPU, however, the program will still run and give the following successful output:

Num GPUs Available:  0                   
tf.Tensor(                               
[[22. 28.]                               
 [49. 64.]], shape=(2, 2), dtype=float32)

So the above, eventually, ran fine, but did not report any GPUs. The reason for this is of course that we never asked for any GPUs in the first place. To remedy this we will change the Slurm script to include the --partition=accel and --gpus=1, as follows:

#!/bin/bash
#SBATCH --job-name=TestGPUOnFox
#SBATCH --account=ec<XX>
#SBATCH --time=05:00
#SBATCH --mem-per-cpu=512M
#SBATCH --qos=devel
#SBATCH --partition=accel
#SBATCH --gpus=1

## Set up job environment:
set -o errexit  # Exit the script on any error
set -o nounset  # Treat any unset variables as an error

module --quiet purge  # Reset the modules to the system default
module load TensorFlow/2.4.1-fosscuda-2020b
module list

python gpu_intro.py

We should now see the following output:

Num GPUs Available:  1                    
tf.Tensor(                                
[[22. 28.]                                
 [49. 64.]], shape=(2, 2), dtype=float32)

However, with complicated libraries such as Tensorflow we are still not guaranteed that the above actually ran on the GPU. There is some output to verify this, but we will check this manually as that can be applied more generally.

Monitoring the GPUs

To monitor the GPU(s), we will start nvidia-smi before our job and let it run while we use the GPU. We will change the submit_gpu.sh Slurm script above to submit_monitor.sh, shown below:

#!/bin/bash
#SBATCH --job-name=TestGPUOnFox
#SBATCH --account=ec<XX>
#SBATCH --time=05:00
#SBATCH --mem-per-cpu=512M
#SBATCH --qos=devel
#SBATCH --partition=accel
#SBATCH --gpus=1

## Set up job environment:
set -o errexit  # Exit the script on any error
set -o nounset  # Treat any unset variables as an error

module --quiet purge  # Reset the modules to the system default
module load TensorFlow/2.4.1-fosscuda-2020b
module list

# Setup monitoring
nvidia-smi --query-gpu=timestamp,utilization.gpu,utilization.memory \
	--format=csv --loop=1 > "gpu_util-$SLURM_JOB_ID.csv" &
NVIDIA_MONITOR_PID=$!  # Capture PID of monitoring process
# Run our computation
python gpu_intro.py
# After computation stop monitoring
kill -SIGINT "$NVIDIA_MONITOR_PID"

Note
The query used to monitor the GPU can be further extended by adding additional parameters to the `--query-gpu` flag. Check available options here.

Run this script with sbatch submit_monitor.sh to test if the output gpu_util-<job id>.csv actually contains some data. We can then use this data to ensure that we are actually using the GPU as intended. Pay specific attention to utilization.gpu which shows the percentage of how much processing the GPU is doing. It is not expected that this will always be 100% as we will need to transfer data, but the average should be quite high.

Note
We are working on automating the above monitoring solution so that all GPU jobs output similar statistics. In the mean time the above solution could help indicating your job's resource utilization.

^{CC Attribution: This page is maintained by the University of Oslo IT FFU-BT group.
It has either been modified from, or is a derivative of, "Introduction to using GPU compute"
by NRIS under CC-BY-4.0.
Changes: Removed "Next steps" section.}