Digital Alliance of Canada

The Digital Research Alliance of Canada provides national high-performance computing (HPC) infrastructure for Canadian researchers. This page covers how to get access and use the Nibi cluster for research computing.

Getting an Account
Passwordless SSH Setup
Mounting Data (Mac/Linux)
File Transfer
Group Permissions
Job Submission
SLURM Commands
Visualization

1. Getting an Account

Create an account at the CCDB portal and use the following sponsor when prompted:

Sponsor: Ankush Agarwal (CCRI: aju-094-01)

Once your account is approved, log in to Nibi with:

ssh <user>@nibi.alliancecan.ca

Replace <user> with your Alliance Canada username throughout this guide.

2. Passwordless SSH Setup

Generate an SSH key pair on your local machine (skip if you already have one):

ssh-keygen -t rsa

Copy your public key to Nibi so future logins require no password:

ssh-copy-id -i ~/.ssh/id_rsa.pub <user>@nibi.alliancecan.ca

4. File Transfer

The recommended method for transferring large datasets is Globus, which provides reliable, high-speed transfers and can resume interrupted transfers automatically.

The Globus endpoint for Nibi is:

alliancecan#nibi

5. Group Permissions

To allow other group members to access your files in the shared project space, run the following commands:

chgrp -R def-ankush ~/projects/*/$USER
chmod -R g+rwXs ~/projects/*/$USER
chgrp -R def-ankush $HOME
chmod -R g+rwXs $HOME

6. Job Submission

Jobs on Nibi are managed by SLURM. Below are common job templates for typical workloads.

Profile	Cores	Memory	Wall time
Regular	8	32 GB	24 hours
Short	8	32 GB	3 hours
Fat (memory-intensive)	32	128 GB	24 hours

A minimal SLURM batch script looks like:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --account=def-ankush

module load python/3.11
python my_script.py

Submit with sbatch job.sh. For interactive sessions use:

salloc --ntasks=1 --cpus-per-task=4 --mem=16G --time=2:00:00 --account=def-ankush

7. SLURM Commands

Command	Description
`sq`	View your queued and running jobs
`sshare -U $USER`	Check your current share usage
`scancel <jobid>`	Cancel a specific job
`scancel -u $USER`	Cancel all your jobs
`sacct -j <jobid> --format JobID,ReqMem,MaxRSS,Timelimit,Elapsed`	Check resource usage of a completed job

Add these to your ~/.bashrc for more readable sacct output:

echo "export SLURM_TIME_FORMAT=relative" >> ~/.bashrc
echo "export SACCT_FORMAT=JobID%-20,Start%-10,Elapsed%-10,State,AllocCPUS%8,MaxRSS,NodeList" >> ~/.bashrc