Submitting a job

Submitting a job

Last updated:

Slurm

Slurm for cluster/resource management and job scheduling. Slurm is responsible for allocating resources to users, providing a framework for starting, executing and monitoring work on allocated resources and scheduling work for future execution.

Slurm Commands

CommandDescription
sinfoView information about SLURM nodes and partitions
squeueDisplay information about jobs in the queue
sbatch <script>Submit a batch job script to SLURM
scancel <job_id>Cancel a specific job by its job ID
srun <command>Run a command interactively or in a job
sallocAllocate resources for an interactive job
scontrolView or modify SLURM configuration and state

Slurm Partitions

Show below are the slurm partitions you can run your jobs. You can run jobs on cpu-queue or cpu-mem-queue if you want to run CPU only job and use gpu-queue if you want to run on GPU nodes.

$ sinfo
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
cpu-queue*       up   infinite     19  idle~ cpu-queue-dy-c5x4-[1,3-20]
cpu-queue*       up   infinite      1  alloc cpu-queue-dy-c5x4-2
cpu-mem-queue    up   infinite     20  idle~ cpu-mem-queue-dy-r6x4-[1-20]
gpu-queue        up   infinite     22  idle~ gpu-queue-dy-g4x4-[1-20],gpu-queue-dy-g6x12-[1-2]

To learn more about our HPC system, see here.

Shown below are some simple examples to get started.

$ sbatch -N1 -p cpu-queue --wrap="hostname"
Submitted batch job 68

You can see the status of the job using scontrol show job, if it shows JobState=CONFIGURING that means Slurm is waiting for a compute node to be provisioned before job can be run. You may expect some delay until job is dispatched which is expected.

$ scontrol show job 68
JobId=68 JobName=wrap
   UserId=shahzebsiddiqui93358008(1170) GroupId=shahzebsiddiqui93358008(1170) MCS_label=N/A
   Priority=1 Nice=0 Account=(null) QOS=normal
   JobState=CONFIGURING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:01:48 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2025-07-17T02:45:27 EligibleTime=2025-07-17T02:45:27
   AccrueTime=2025-07-17T02:45:27
   StartTime=2025-07-17T02:45:27 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-07-17T02:45:27 Scheduler=Main
   Partition=cpu-queue AllocNode:Sid=ip-10-188-48-105:1814577
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cpu-queue-dy-c5x4-1
   BatchHost=cpu-queue-dy-c5x4-1
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=1,mem=31129M,node=1,billing=1
   AllocTRES=cpu=1,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/camber/home/shahzebsiddiqui93358008
   StdErr=/camber/home/shahzebsiddiqui93358008/slurm-68.out
   StdIn=/dev/null
   StdOut=/camber/home/shahzebsiddiqui93358008/slurm-68.out

Once job is complete, you will see the result:

$ cat slurm-68.out
cpu-queue-dy-c5x4-1

Job Script Example

Shown below is an example MPI hello world hello.c and slurm job script hello.slurm.

Hello Work MPI Example
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(&argc, &argv);

    // Get the rank of the process
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // Get the total number of processes
    int size;
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // Print message from each process
    printf("Hello from process %d of %d\n", rank, size);

    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
}
#!/bin/bash
#SBATCH --job=mpi_hello
#SBATCH --output=mpi_hello_%j.out
#SBATCH --error=mpi_hello_%j.out
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=2
#SBATCH --time=00:05:00

module load openmpi
mpicc -o mpi_hello hello.c
srun --mpi=pmix ./mpi_hello

The slurm script will run 4 MPI tasks (--ntasks=4) with 2 tasks per node (--ntasks-per-node=2) using 2 nodes (--nodes=2).

We will load openmpi module that will enable us to compile code, we compile the code using mpicc given the source code hello.c into executable mpi_hello and then we will run this executable via srun.

Let’s submit the job via sbatch, keep track of the JobID, in this example the job is 323:

$ sbatch hello.slurm
Submitted batch job 323

Once job is finished, you can see the result, by inspecting the job via scontrol show job <JOBID>. This slurm job has output mpi_hello_323.out, whose content is shown below. We see there are 4 processes printing Hello from each MPI task

$ cat mpi_hello_323.out
Hello from process 2 of 4
Hello from process 3 of 4
Hello from process 0 of 4
Hello from process 1 of 4

Application-specific instructions are available in the Application Support and Build Guidance.

References