Queue System

Table of Contents

This page documents the queue system on the Colossus (HPC for TSD) cluster.

In TSD, each project has its own Linux Virtual Machine (VM) for submitting jobs to the cluster.

Submitting a job

To run a job on the cluster, you submit a job script into the job queue, and the job is started when one or more suitable compute nodes are available. The job queue is managed by a queue system (scheduler and resource manager) called Slurm (Slurm's documentation page).

Please note that jobscript names should not contain sensitive information.

Job scripts are submitted with the sbatch command:

sbatch YourJobscript

The sbatch command returns a jobid, an id number that identifies the submitted job. The job will be waiting in the job queue until there are free compute resources it can use. A job in that state is said to be pending (PD). When it has started, it is called running (R). Any output (stdout or stderr) of the job script will be written to a file called slurm-jobid.out in the directory where you ran sbatch, unless otherwise specified.

All commands in the job script are performed on the compute-node(s) allocated by the queue system. The script also specifies a number of requirements (memory usage, number of CPUs, run-time, etc.), used by the queue system to find one or more suitable machines for your job.

You can cancel running or pending (waiting) jobs with scancel:

scancel jobid # Cancel job with id jobid (as returned from sbatch)
scancel --user=MyUsername    # Cancel all your jobs
scancel --account=MyProject  # Cancel all jobs in MyProject

See man scancel for more details.

Project Quota

On Colossus, each user only has access to a single project, the name of which (pNN) is the prefix of the user name. Projects can have access to up to 3 allocations of computational resources in TSD:

TSD allocation
Sigma2 allocation
Dedicated allocation

TSD allocation

To use this resource, jobs should be submitted with the "--account=pNN_tsd" argument. Where pNN is your project number.

Sigma2 allocation

Only TSD projects with cpu hour quota from Sigma2 can use this pool. We advice any project with substantial computational needs to request CPU (and disk) quota from Sigma2 as described here. Sigma2 quota are valid for 6 month periods, starting April 1 and October 1.

To use this resource, jobs should be submitted with the "--account=pNN" argument. Where pNN is your project number. If you submit to this resource, but the project doesn't have a Sigma2 quota, jobs will remain pending (PD) with reason "AssocGrpBillingMinutes".

Dedicated allocation

All other compute and gpu nodes in Colossus were acquired by individual projects for privileged access but are maintained by TSD.

To use this resource, jobs should be submitted with the "--account=pNN_reservationname" argument. Where pNN is your project number and the reservationname the name of the dedicated resource.

Inspecting Quota

The command cost can be used to inspect the quota and see how much of it has been used and how much remains. This applies to both the non-accountable UiO quota (listed as non-applicable, NA) as well the accountable Sigma2 quota (but not other private allocations). For projects with an accountable Sigma2 quota, a typical output will look like:

-bash-4.2$ cost
Report for project p1337 on Colossus
Allocation period 2023.1 (2023-04-01 -- 2023-10-01)
Last updated on Mon Jun 12 14:40:21 2023
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p1337      Used (finished)                 109.15       5.5 %
p1337      Reserved (running)                0.00       0.0 %
p1337      Pending (waiting)                 0.00       0.0 %
p1337      Available                      1890.85      94.5 %
p1337      Quota                          2000.00     100.0 %
=============================================================
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p1337_tsd  Used (finished)                   9.51          NA
p1337_tsd  Reserved (running)                0.00          NA
p1337_tsd  Pending (waiting)                 0.00          NA
p1337_tsd  Available                           NA          NA
p1337_tsd  Quota                               NA          NA
=============================================================

For the projects without a Sigma2 cpu hour quota, a typical output will look like:

-bash-4.2$ cost
Report for project p77 on Colossus
Allocation period 2023.1 (2023-04-01 -- 2023-10-01)
Last updated on Mon Jun 12 14:25:44 2023
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p77        Used (finished)                   0.00          NA
p77        Reserved (running)                0.00          NA
p77        Pending (waiting)                 0.00          NA
p77        Available                         0.00          NA
p77        Quota                             0.00          NA
=============================================================
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p77_tsd    Used (finished)                   0.00          NA
p77_tsd    Reserved (running)                0.00          NA
p77_tsd    Pending (waiting)                 0.00          NA
p77_tsd    Available                           NA          NA
p77_tsd    Quota                               NA          NA
=============================================================
The project does not have a Sigma2 quota for the current period.
See /english/services/it/research/sensitive-data/use-tsd/hpc/queue-system.html#toc2
for information.

Notice that "Available" shows the "Quota" minus the "Used", the "Reserved" and the "Pending". So "Available" it is what is expected to be available if all running and pending jobs use the hours they specify. (Usually, jobs specify longer --time than they actually use, so "Available" will typically increase as jobs finish.)

One can also list the use per user within the project by adding "--details":

cost --details

This will append the quota usage per user:

Report for project p1337 on Colossus
Allocation period 2023.1 (2023-04-01 -- 2023-10-01)
Last updated on Mon Jun 12 14:30:03 2023
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p1337      Used (finished)                 109.15       5.5 %
p1337      Reserved (running)                0.00       0.0 %
p1337      Pending (waiting)                 0.00       0.0 %
p1337      Available                      1890.85      94.5 %
p1337      Quota                          2000.00     100.0 %
=============================================================
Account    User                Used billing hours  % of quota
=============================================================
p1337      p1337-bartt                      95.36       4.8 %
p1337      p1337-bhm                        13.79       0.7 %
=============================================================
=============================================================
Account    Description              Billing hours  % of quota
=============================================================
p1337_tsd  Used (finished)                   9.51          NA
p1337_tsd  Reserved (running)                0.00          NA
p1337_tsd  Pending (waiting)                 0.00          NA
p1337_tsd  Available                           NA          NA
p1337_tsd  Quota                               NA          NA
=============================================================
Account    User                Used billing hours  % of quota
=============================================================
p1337_tsd  p1337-bartt                       8.48          NA
p1337_tsd  p1337-haatveit                    1.03          NA
=============================================================

See man cost for details about this command.

Accounting

Accounting is done in terms of billing units, and the quota is in billing unit hours. Each job is assigned a number of billing units based on the requested CPUs, memory and GPUs. The number that is subtracted from the quota is the number of billing units multiplied with the (actual) wall time of the job.

The number billing units of a job is calculated like this:

Each requested CPU is given a cost of 1.
The requested memory is given a cost based on a memory cost factor (see below).
The requested GPU is given a cost based on a GPU cost factor (see below).
The number of billing units is the maximum of the CPU cost, memory cost and GPU cost.

The memory cost factor and the GPU cost factor vary between nodes.

For regular compute nodes, the memory cost factor is 0.12749 units per GiB. Thus the memory cost of a job asking for all memory on a node will be 64, the number of CPUs on the node.
For GPU nodes, the memory cost factor is 0.09775967 units per GiB, and the GPU cost factor is 24 per GPU. The means that a job asking for all memory, or all GPUs on a node, get a cost of 96, the number of CPUs on the node.
For bigmem nodes, the memory cost factor is 0.0323641 per GiB. Thus the memory cost of a job asking for all memory on a node will be 128, the number of CPUs on the node.

When a project has exceeded the Quota limit, jobs will be left pending with the reason "AssocGrpBillingMinutes".

Inspecting Jobs

To get a quick view of the status of a job, you can use squeue:

squeue -j JobId

where JobId is the job id number that sbatch returns. To see more details about a job, use

scontrol show job JobId

See man squeue and man scontrol for details about these commands.

Inspecting the Job Queue

There are several available commands to inspect the job queue:

squeue: list jobs in the queue
pending: list the pending (waiting) jobs in the queue
qsumm: show summary of queue usage

To see the list of running or pending jobs in the queue, use the command squeue. squeue will only show the jobs in your own project. Useful squeue options:

[-j jobids]   show only the specified jobs
[-w nodes]    show only jobs on the specified nodes
[-t states]   show only jobs in the specified states (pending, running,
             suspended, etc.)
[-u users]    show only jobs belonging to the specified users

All specifications can be comma separated lists. See man squeue for details. Examples:

squeue -j 14132,14133    # shows jobs 4132 and 4133
squeue -w c1-11         # shows jobs running on c1-11
squeue -u foo -t PD      # shows pending jobs belonging to user 'foo'

Squeue status (ST)
Status	Text
PD	Pending
R	Running
S	Suspended
CG	Completing
CD	Completed
CF	Configuring
CA	Cancelled
F	Failed
TO	Timeout
PR	Preempted
NF	Node failed

You can use the pending command to list only the pending jobs. It lists the jobs in descending priority order, and includes an estimate of when the job will start, when such an estimate is available. pending is simply a wrapper around squeue, and accepts all options that squeue takes. It will also just show the jobs belonging to your own project.

To see the resource situation of the cluster, use the command qsumm. It shows how many CPUs (or rather, Processor Equivalents) are used by running jobs and are requested by pending jobs. The output has two lines, one for your project, and one showing the total usage for all projects (including your project). An example output:

--------------------------------
Account        Limit  nRun nPend
--------------------------------
p11             1536    20    11 
Total           1536  1550   200
--------------------------------

See qsumm --help for explanations of each column. The output is updated every 5 minutes, so it can take a couple of minutes after jobs are submitted/started/finished before it shows in the qsumm output.

General Job Limitations

Default values when nothing is specified

1 core (CPU)

The rest (time, mem per cpu, etc.) must be specified.

Limits

The max wall time is 4 weeks, but do not submit jobs that will run for more than 7 days unless they implement checkpointing: None of the nodes on Colossus have dual power, and we reserve the right to shutdown any node for maintenance at any time with 7 days notice!
Max 4500 submitted jobs (running or pending) per project (--account) at the same time.
Max size of job arrays: 4000. That also means that the largest job array index is 4000. Note that an job array of size N will count as N jobs wrt. the total number of submitted jobs per project.

Scheduling

Jobs are started by priority. Pending jobs are prioritized according to

Queue time
How many jobs each user has pending in the queue

Colossus starts jobs by priority + backfilling, so small, short jobs can start earlier than jobs with higher priority, as long as they do not delay the higher priority jobs. In addition, we have added a limit on how many jobs belonging to a user increase in priority over time, to avoid a single user preventing all other users from getting jobs run by submitting a large number of jobs at the same time. In this way, the priority will in effect increase for users running few jobs relative to users running many jobs. This is a trade-off, and we will adjust the limit (currently 10) if we see that the effect is too large/small.

Search the user manual

Contact support

Call us

Opening hours are weekdays from 08:30 to 17:00 and Saturdays from 10:00 to 15:00.

Phone number: 22 84 00 04

Your request can be sent to it-hjelp@uio.no.

Send email

Book a Zoom meeting

Students and employees can book a Zoom meeting with UiO Helpdesk. Available hours are Tuesday to Thursday between 11:00 and 13:00.

Book a video call

Chat with us

Our chat is open every weekday between 09:00 and 16:00.

Start chat

Did you find what you were looking for?

Published June 21, 2021 10:35 AM - Last modified June 21, 2023 1:39 PM