Computing resources guide for the Agarwal research group — University of Western Ontario
The Digital Research Alliance of Canada provides national high-performance computing (HPC) infrastructure for Canadian researchers. This page covers how to get access and use the Nibi cluster for research computing.
Create an account at the CCDB portal and use the following sponsor when prompted:
Sponsor: Ankush Agarwal (CCRI: aju-094-01)
Once your account is approved, log in to Nibi with:
ssh <user>@nibi.alliancecan.ca
<user> with your Alliance Canada username throughout this guide.
Generate an SSH key pair on your local machine (skip if you already have one):
ssh-keygen -t rsa
Copy your public key to Nibi so future logins require no password:
ssh-copy-id -i ~/.ssh/id_rsa.pub <user>@nibi.alliancecan.ca
The recommended method for transferring large datasets is Globus, which provides reliable, high-speed transfers and can resume interrupted transfers automatically.
The Globus endpoint for Nibi is:
alliancecan#nibi
Log in at globus.org, search for the endpoint above, and use the graphical interface to initiate transfers.
To allow other group members to access your files in the shared project space, run the following commands:
chgrp -R def-ankush ~/projects/*/$USER
chmod -R g+rwXs ~/projects/*/$USER
chgrp -R def-ankush $HOME
chmod -R g+rwXs $HOME
Jobs on Nibi are managed by SLURM. Below are common job templates for typical workloads.
| Profile | Cores | Memory | Wall time |
|---|---|---|---|
| Regular | 8 | 32 GB | 24 hours |
| Short | 8 | 32 GB | 3 hours |
| Fat (memory-intensive) | 32 | 128 GB | 24 hours |
A minimal SLURM batch script looks like:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --account=def-ankush
module load python/3.11
python my_script.py
Submit with sbatch job.sh. For interactive sessions use:
salloc --ntasks=1 --cpus-per-task=4 --mem=16G --time=2:00:00 --account=def-ankush
| Command | Description |
|---|---|
sq |
View your queued and running jobs |
sshare -U $USER |
Check your current share usage |
scancel <jobid> |
Cancel a specific job |
scancel -u $USER |
Cancel all your jobs |
sacct -j <jobid> --format JobID,ReqMem,MaxRSS,Timelimit,Elapsed |
Check resource usage of a completed job |
Add these to your ~/.bashrc for more readable sacct output:
echo "export SLURM_TIME_FORMAT=relative" >> ~/.bashrc
echo "export SACCT_FORMAT=JobID%-20,Start%-10,Elapsed%-10,State,AllocCPUS%8,MaxRSS,NodeList" >> ~/.bashrc