Rapids#
Prerequisites#
Regardless of what container program you use, you will need to install the following prerequisites.
CUDA Drivers#
Nvidia Cuda Toolkit#
Docker Desktop#
Note
Docker Desktop is required even if you only intend on using Singularity containers.
Make sure your version of Cuda Toolkit matches the version of your Cuda drivers
You can find further documentation on installation of Rapids on their official documentation here:
Details on installing Cuda Toolkit can be found under the Docker heading, and there are some solutions for common installation errors listed.
Container Platform#
Installation guides for the container platform can be found here:
Apptainer#
First, pull the Rapids container. Cuda version, Python version, and Runtime can be edited to pull varying containers.
apptainer pull docker://nvcr.io/nvidia/rapidsai/rapidsai:23.04-cuda11.8-runtime-ubuntu22.04-py3.10
This will pull the container as well as convert it to a SIF container. Note that you need between 40-50GB of free space for the build process.
Run the container
apptainer run rapidsai_23.04-cuda11.8-runtime-ubuntu22.04-py3.10.sif
Use shell command to access the shell or executive commands within it.
apptainer shell rapidsai_23.04-cuda11.8-runtime-ubuntu22.04-py3.10.sif
Singularity#
pull the container
singularity build rapidsai.sif docker://rapidsai/rapidsai-core:23.06-cuda11.8-runtime-ubuntu22.04-py3.10
A list of available variants and their associated tags can be found here: Docker Hub
Note
If you intend on using numpy, be sure to specify Python version 3.9 or lower
If you are using a GPU with pascal architecture, be sure to specify Rapids version 21.08 or lower
Running Containers Locally#
run the container
singularity run --nvccli rapidsai.sif
Note
–nvccli tag is used for compatibility with WSL2, on newer systems and the supercomputer it can be replaced with nv
singularity run --nv rapidsai.sif
Note
Containers can run in two ways: Interactively, and uninteractively. When a container runs interactively, you will enter a prompt where you can interact with the file system. When it runs in noninteractive mode, you will not enter a prompt, but any output of the container will show, if you included scripts in the container, etc.
once you are inside the container, be sure to activate the rapids environment:
source activate rapids
this command also works:
. activate rapids
Extending / Rebuilding Containers#
containers can be rebuilt and modified using definition files and the build command like so:
singularity build rapidsai.sif new.def
Definition Files#
Building from a local base container#
The following is an example/template of a definition file that uses an existing local container as a base. It creates a workspace directory, then copies randomforest.py from the host system into the container:
Bootstrap: localimage
From: rapidsai2.sif
%setup
mkdir ${SINGULARITY_ROOTFS}/workspace
cp randomforest.py ${SINGULARITY_ROOTFS}/workspace
%post
# Additional installation and configuration steps
%runscript
# Start script or command
%environment
# Environment variables
%labels
# Metadata and labels
Building from a pulled base container#
The following is the definition file for building a DeepHyperX container(modified to start with a pulled Rapids base container and mostly commented out), and shows what can be put in each section:
Bootstrap: docker
From: nvcr.io/nvidia/rapidsai/rapidsai:21.08-cuda11.0-runtime-ubuntu20.04
%setup
# Create a directory called "workspace" at the root of the image's file system.
mkdir ${SINGULARITY_ROOTFS}/workspace
#mkdir ${SINGULARITY_ROOTFS}/workspace/three_layer_classification
#mkdir ${SINGULARITY_ROOTFS}/workspace/SpectralNet
#mkdir ${SINGULARITY_ROOTFS}/workspace/A2S2KResNet
# Create a directory called "data" and a directory called "Datasets" inside that folder on the host system.
#mkdir -p data/Datasets
%files
# Copy files from the host machine to the workspce folder on the Singularity image.
#main.py /workspace
#mainSpectral.py /workspace
#mainTesting.py /workspace
#models.py /workspace
#datasets.py /workspace
#custom_datasets.py /workspace
#inference.py /workspace
#utils.py /workspace
#License /workspace
#README.md /workspace
#requirements.txt /workspace
start.sh /workspace
#three_layer_classification/*.py /workspace/three_layer_classification
#A2S2KResNet/*.py /workspace/A2S2KResNet
#SpectralNet/*.py /workspace/SpectralNet
randomforest.py /workspace
%environment
# Set environment variables
export LISTEN_PORT=8097
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
export DEBIAN_FRONTEND=noninteractive
%post
# Install necessary packages
#apt-get update
#apt-get install -y curl
#apt-get install -y tzdata
#apt-get install -y bzip2
#apt-get install -y ca-certificates
#apt-get install ffmpeg libsm6 libxext6 -y
#ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
#dpkg-reconfigure --frontend noninteractive tzdata
#curl -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
#chmod +x ~/miniconda.sh && \
#~/miniconda.sh -b -p /opt/conda && \
#rm ~/miniconda.sh
#rm -rf /var/lib/apt/lists/*
%runscript
# When the SIF file is run, it will download the packages listed in requirements.txt if not already installed.
#echo "installing packages"
#pip3 install --no-cache-dir -r /workspace/requirements.txt
# It will then run start.sh
sh /workspace/start.sh
cd /workspace
%startscript
nc -lp $LISTEN_PORT
%test
grep -q NAME=\"Ubuntu\" /etc/os-release
if [ $? -eq 0 ]; then
echo "Container base is Ubuntu as expected."
else
echo "Container base is not Ubuntu."
fi
%labels
Author d@sylabs.io
Version v0.0.1
%help
This is a demo container used to illustrate a def file that uses all
supported sections.
Transferring a SIF to Expanse#
Navigate to the directory of the sif in a local terminal and use the following scp command to transfer the file to the supercomputer:
scp rapidsai.sif <username>@login.expanse.sdsc.edu:/expanse/lustre/scratch/<user>/temp_project/rapids
The transfer can be slow, so if for some reason the pipe breaks, you can resume the transfer using the following command:
rsync --partial --progress --rsh=ssh rapidsai.sif <username>@login.expanse.sdsc.edu:/expanse/lustre/scratch/<user>/temp_project/rapids/rapidsai.sif
Running the container on Expanse#
Login to expanse using ssh, then navigate to the scratch
cd /expanse/lustre/scratch/$USER/temp_project
In the above example we transferred it into a rapids folder, but from here navigate to wherever you transferred the sif file. Next, request an interactive session utilizing the gpu-shared partition with the following command:
srun --partition=gpu-shared --pty --account=aub101 --ntasks-per-node=10 --nodes=1 --mem=96G --gpus=1 -t 00:30:00 --wait=0 --export=ALL /bin/bash
Note
Anything you are doing in 30 minutes when the session runs out will stop working
Next load the singularity module
module load singularitypro/3.9
Now you can run the container
singularity run --nv rapidsai.sif
Note
You can also use ‘singularity exec’ to executive commands within the container, and use ‘singularity shell’ to enter an interactive shell of the container.
Depending on if the container was built to be interactive, you may or may not enter a prompt. If you don’t enter a prompt and don’t get any errors, the container ran, but had no output. To make sure you enter a shell of the container use the following command:
singularity shell --nv rapidsai.sif
You shoud see a prompt like this “singularity >>” which means you are inside the container now. you can now interact with the container, or exit using exit. First activate the rapids environment. without it none of the rapids libriaries load.
source activate rapids
Now run the following command to confirm that you have access to the GPU and that the GPU drivers are working correcly:
nvidia-smi
You should see output with information like the GPU type, cuda drivers, etc. This confirms that rapids works in the container.