The Digital Research Alliance of Canada provides national high-performance computing (HPC) infrastructure for Canadian researchers. This page covers how to get access and use the Nibi cluster for research computing.

Contents
  1. Getting an Account
  2. Passwordless SSH Setup
  3. Mounting Data (Mac/Linux)
  4. File Transfer
  5. Group Permissions
  6. Job Submission
  7. SLURM Commands
  8. Visualization

1. Getting an Account

Create an account at the CCDB portal and use the following sponsor when prompted:

Sponsor: Ankush Agarwal  (CCRI: aju-094-01)

Once your account is approved, log in to Nibi with:

ssh <user>@nibi.alliancecan.ca
Replace <user> with your Alliance Canada username throughout this guide.

2. Passwordless SSH Setup

Generate an SSH key pair on your local machine (skip if you already have one):

ssh-keygen -t rsa

Copy your public key to Nibi so future logins require no password:

ssh-copy-id -i ~/.ssh/id_rsa.pub <user>@nibi.alliancecan.ca

4. File Transfer

The recommended method for transferring large datasets is Globus, which provides reliable, high-speed transfers and can resume interrupted transfers automatically.

The Globus endpoint for Nibi is:

alliancecan#nibi

Log in at globus.org, search for the endpoint above, and use the graphical interface to initiate transfers.

5. Group Permissions

To allow other group members to access your files in the shared project space, run the following commands:

chgrp -R def-ankush ~/projects/*/$USER
chmod -R g+rwXs ~/projects/*/$USER
chgrp -R def-ankush $HOME
chmod -R g+rwXs $HOME

6. Job Submission

Jobs on Nibi are managed by SLURM. Below are common job templates for typical workloads.

Profile Cores Memory Wall time
Regular 8 32 GB 24 hours
Short 8 32 GB 3 hours
Fat (memory-intensive) 32 128 GB 24 hours

A minimal SLURM batch script looks like:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --account=def-ankush

module load python/3.11
python my_script.py

Submit with sbatch job.sh. For interactive sessions use:

salloc --ntasks=1 --cpus-per-task=4 --mem=16G --time=2:00:00 --account=def-ankush

7. SLURM Commands

Command Description
sq View your queued and running jobs
sshare -U $USER Check your current share usage
scancel <jobid> Cancel a specific job
scancel -u $USER Cancel all your jobs
sacct -j <jobid> --format JobID,ReqMem,MaxRSS,Timelimit,Elapsed Check resource usage of a completed job

Add these to your ~/.bashrc for more readable sacct output:

echo "export SLURM_TIME_FORMAT=relative" >> ~/.bashrc
echo "export SACCT_FORMAT=JobID%-20,Start%-10,Elapsed%-10,State,AllocCPUS%8,MaxRSS,NodeList" >> ~/.bashrc