Job submission

Information on the PBS Professional stacking system

The following commands are available for working with the PBS Pro stacking system:

Command	Description
qsub {parameter} {job script}	Submitting a job
qstat {parameter} {job-id}	Query the status of one/all of your jobs
qalter {parameter} {job-id}	Adjust the parameters of a job subsequently
qdel {parameter} {job-id}	Cancel or delete a job

Jobs are transferred to the cluster system using scripts. The script header begins with a hash bang line (e.g.: "#!/bin/bash") and then contains commands for the PBS batch system. Instructions for this begin with "#PBS":

Statement	Description
#PBS -N Jobname	Name of the job
#PBS -l select=X:ncpus=Y:mpiprocs=Y:mem=Zgb	Resource request regarding hardware
. Hardware
#PBS -l walltime=XX:YY:ZZ	Resource request regarding runtime
#PBS -l place={scatter\|vscatter\|pack\|free\|{:excl\|exclhost\|shared}{\|:group=host}	scatter ... Each chunk gets its own node vscatter ... Each chunk gets its own VNode (corresponds to a processor socket) pack ... Everything is calculated on one node free ... Random distribution of the chunks excl ... No further job starts on the VNode, even if resources are still free exclhost ... The entire node is blocked shared ... Standard behaviour; resources are shared with other jobs group=host ... All chunks must be on the same node
#PBS -m abe	E-mail notification on cancellation, start and end of the job
#PBS -M Ihre [dot] Mailadresse [at] tu-freiberg [dot] de (Ihre[dot]Mailadresse[at]tu-freiberg[dot]de)	E-mail recipients for notifications
#PBS -o standardoutput.out	Use alternative output file Default value: ".o
#PBS -e erroroutput.out	Use alternative error output file Default value: ".e
#PBS -r y	"Rerunable", rerun job automatically on error Default value: "true"
#PBS -q entryq	Specify the queue Only necessary for GPU and institute nodes

All parameters can also be specified as qsub parameters on the command line and in this case overwrite values in a job script.

After specifying the PBS options mentioned above, the main part of the script should contain at least the following:

Software to be loaded
Program call with input file and any other parameters

Specific examples can be found under the example scripts.

Explanation of the requested resources

Parameter	Description
select	Number of requested "chunks".
ncpus	Number of cores per chunk Can be a maximum of 40 or 64.
mpiprocs	Number of MPI processes per core. Can be a maximum of two times the number of ncpus (Hyper-Threading).
ompthreads	Number of OpenMP threads per process. Can be a maximum of two times the number of ncpus (Hyper-Threading).
mem	Requested memory per chunk. Possible units: MB, GB, TB
ngpus	Requested GPUs per chunk Can be at most equal to "1", as there is only one GPU per node
walltime	Maximum runtime in the format HHH:MM:SS

Example 1: qsub -l select=2:ncpus=20:mpiprocs=10:ompthreads=2:mem=40GB,walltime=10:00:00 -o job.out -e job.err pbspro.script

A parallel calculation of a maximum of 10 hours is to be performed here. The output files are job.out and job.err. The processes are distributed across 2 chunks with 20 cores each, which in turn have 10 MPI processes. This means that in order to utilise the 20 required cores, the application must run with 2 threads per MPI process. The walltime must be in the format HHH:MM:SS even if zero values are specified. The string is evaluated from right to left.

walltime=120:00 therefore does not result in 120 h, but 2 h
To obtain 120 h, you must specify walltime=120:00:00

Example 2: qsub -l select=2:ncpus=40:mpiprocs=80:mem=80GB

Two full nodes including the hyper-threads are booked.

Example 3: qsub -l select=1:ncpus=40:ompthreads=80:mem=80GB

Here, an OpenMP application will fully utilise one node.

Example 4: qsub -l select=1:ncpus=40:mpiprocs=40:ompthreads=2:mem=80GB

This example shows the requirement for a hybrid application with 40 MPI ranks and two OpenMP threads per rank.

Interactive jobs

Working interactively on a command line in batch mode allows you to run tests or debug programmes, for example. To do this, use the command qsub -I [options] (capital i). The options are analogous to the PBS instructions mentioned above.

Example: qsub -I -l select=1:ncpus=10:mpiprocs=10:mem=10GB # (first upper case i, then lower case L)

You can also use interactive jobs to display graphical applications on your end device while utilising the computing capacities of the cluster. Support from the application is required. To do this, use qsub -I -X [options]. Log in to the login node with activated X-forwarding beforehand ("ssh -X mlogin01").

Reservations

In order to use system resources exclusively for a certain period of time without being tied to a single job and its waiting time, you can make reservations via PBS. The following commands are used for this.

Command	Explanation
pbs_rsub	Creating a reservation
pbs_rstat	Status query of Reservations
pbs_rdel	Delete a reservation
pbs_ralter	Change a reservation

The following parameters can be used with pbs_rsub:

Parameters	Description
-R [[[[CC]YY]MM]DD]hhmm[.SS]	Start time
-E [[[[CC]YY]MM]DD]hhmm[.SS]	End time
-D [[HH:]MM:]SS	Duration of the reservation
-N	Name of the reservation
-l	Define resource quantity
-U	Comma-separated user list
-m abce	Email notification on cancellation, Start, confirmation and end of reservation
-M	Comma-separated list of email addresses for notifications

At least two of the three parameters mentioned must be set for the time specifications.

Example

pbs_rsub -R 0800 -D 06:00:00 -l select=5:ncpus=40:mem=100gb

Explanation

This command requests 5 nodes, each with 40 cores and 100GB memory for 6 hours from the next 08:00 time.

Use

The return value can look like this: "R1234.mmaster CONFIRMED". In the reservation time window, you then send your jobs to the "R1234" queue, e.g.:
qsub -q R1234 -l select=5:ncpus=40:mem=100gb
It is possible to split the booked resources across several jobs.

If the requested resources and/or the requested time period are not available, the "R1234.mmaster DENIED" reservation is rejected.

Limitations

Only the requested resources are visible within a job. This can result in an overbooking of CPU or memory causing your programmes and therefore the entire job to crash.

University Computer Centre (URZ)