Skip to main content

Information on the PBS Professional stacking system

The following commands are available for working with the PBS Pro stacking system:

CommandDescription
qsub {parameter} {job script}Submitting a job
qstat {parameter} {job-id}Query the status of one/all of your jobs
qalter {parameter} {job-id}Adjust the parameters of a job subsequently
qdel {parameter} {job-id}Cancel or delete a job

Jobs are transferred to the cluster system using scripts. The script header begins with a hash bang line (e.g.: "#!/bin/bash") and then contains commands for the PBS batch system. Instructions for this begin with "#PBS":

StatementDescription
#PBS -N JobnameName of the job
#PBS -l select=X:ncpus=Y:mpiprocs=Y:mem=ZgbResource request regarding hardware
. Hardware 
#PBS -l walltime=XX:YY:ZZResource request regarding runtime
#PBS -l place={scatter|vscatter|pack|free|{:excl|exclhost|shared}{|:group=host}scatter ... Each chunk gets its own node
vscatter ... Each chunk gets its own VNode (corresponds to a processor socket) 
pack ... Everything is calculated on one node
free ... Random distribution of the chunks
excl ... No further job starts on the VNode, even if resources are still free
exclhost ... The entire node is blocked
shared ... Standard behaviour; resources are shared with other jobs
group=host ... All chunks must be on the same node
#PBS -m abeE-mail notification on cancellation, start and end of the job
#PBS -M Ihre [dot] Mailadresse [at] tu-freiberg [dot] de (Ihre[dot]Mailadresse[at]tu-freiberg[dot]de)E-mail recipients for notifications
#PBS -o standardoutput.outUse alternative output file
Default value: ".o
#PBS -e erroroutput.outUse alternative error output file
Default value: ".e
#PBS -r y"Rerunable", rerun job automatically on error
Default value: "true"
#PBS -q entryqSpecify the queue
Only necessary for GPU and institute nodes

All parameters can also be specified as qsub parameters on the command line and in this case overwrite values in a job script.

After specifying the PBS options mentioned above, the main part of the script should contain at least the following:

  • Software to be loaded
  • Program call with input file and any other parameters

Specific examples can be found under the example scripts.

Explanation of the requested resources

ParameterDescription
selectNumber of requested "chunks".
ncpusNumber of cores per chunk
Can be a maximum of 40 or 64.
mpiprocsNumber of MPI processes per core.
Can be a maximum of two times the number of ncpus (Hyper-Threading).
ompthreadsNumber of OpenMP threads per process.
Can be a maximum of two times the number of ncpus (Hyper-Threading).
memRequested memory per chunk.
Possible units: MB, GB, TB
ngpusRequested GPUs per chunk
Can be at most equal to "1", as there is only one GPU per node
walltimeMaximum runtime in the format HHH:MM:SS

Example 1: qsub -l select=2:ncpus=20:mpiprocs=10:ompthreads=2:mem=40GB,walltime=10:00:00 -o job.out -e job.err pbspro.script

A parallel calculation of a maximum of 10 hours is to be performed here. The output files are job.out and job.err. The processes are distributed across 2 chunks with 20 cores each, which in turn have 10 MPI processes. This means that in order to utilise the 20 required cores, the application must run with 2 threads per MPI process. The walltime must be in the format HHH:MM:SS even if zero values are specified. The string is evaluated from right to left.

  • walltime=120:00 therefore does not result in 120 h, but 2 h
  • To obtain 120 h, you must specify walltime=120:00:00

Example 2: qsub -l select=2:ncpus=40:mpiprocs=80:mem=80GB

Two full nodes including the hyper-threads are booked.

Example 3: qsub -l select=1:ncpus=40:ompthreads=80:mem=80GB

Here, an OpenMP application will fully utilise one node.
 

Example 4: qsub -l select=1:ncpus=40:mpiprocs=40:ompthreads=2:mem=80GB

 

This example shows the requirement for a hybrid application with 40 MPI ranks and two OpenMP threads per rank.

Interactive jobs

Working interactively on a command line in batch mode allows you to run tests or debug programmes, for example. To do this, use the command qsub -I [options] (capital i). The options are analogous to the PBS instructions mentioned above.

Example: qsub -I -l select=1:ncpus=10:mpiprocs=10:mem=10GB # (first upper case i, then lower case L)
 

You can also use interactive jobs to display graphical applications on your end device while utilising the computing capacities of the cluster. Support from the application is required. To do this, use qsub -I -X [options]. Log in to the login node with activated X-forwarding beforehand ("ssh -X mlogin01").

Reservations

In order to use system resources exclusively for a certain period of time without being tied to a single job and its waiting time, you can make reservations via PBS. The following commands are used for this.

CommandExplanation
pbs_rsubCreating a reservation
pbs_rstatStatus query of Reservations
pbs_rdelDelete a reservation
pbs_ralterChange a reservation

The following parameters can be used with pbs_rsub:

ParametersDescription
-R [[[[CC]YY]MM]DD]hhmm[.SS]Start time
-E [[[[CC]YY]MM]DD]hhmm[.SS]End time
-D [[HH:]MM:]SSDuration of the reservation
-NName of the reservation
-lDefine resource quantity
-UComma-separated user list
-m abceEmail notification on cancellation, Start, confirmation and end of reservation
-MComma-separated list of email addresses for notifications

At least two of the three parameters mentioned must be set for the time specifications.

Example

pbs_rsub -R 0800 -D 06:00:00 -l select=5:ncpus=40:mem=100gb

Explanation

This command requests 5 nodes, each with 40 cores and 100GB memory for 6 hours from the next 08:00 time.

Use

The return value can look like this: "R1234.mmaster CONFIRMED". In the reservation time window, you then send your jobs to the "R1234" queue, e.g.:
qsub -q R1234 -l select=5:ncpus=40:mem=100gb
It is possible to split the booked resources across several jobs.

If the requested resources and/or the requested time period are not available, the "R1234.mmaster DENIED" reservation is rejected.

Limitations

Only the requested resources are visible within a job. This can result in an overbooking of CPU or memory causing your programmes and therefore the entire job to crash.