Access
In order to access our HPC cluster, you need an acount on our local system (separate from your URZ login). Students please ask their supervisor to contact the administrator.
Topology and general information
We are currently providing the following partitions:
Partition | #Nodes | #Cores / Node | RAM / Core | Instruction Set |
---|---|---|---|---|
cuda* | 8 | 24 | ~5 GB | SSE1-4a |
chrom | 12 | 24 | ~4 GB | SSE1-4.2, AVX |
sulfur | 2 | 16 | ~4 GB | SSE1-4.2, AVX |
calcium | 4 | 20 | ~5 GB | SSE1-4.2, AVX, AVX2, FMA3 |
magnesium | 1 | 12 | ~5 GB | SSE1-4.2, AVX, AVX2, FMA3 |
*Cuda Nodes have an additional NVidia GeForce GTX 580 GPU with 512 CUDA cores and 1536 MB GDDR5 | ||||
For more detailed information you may run one of the following commands:
- sinfo / sinfo --long -N
- listnodes / listnodes -l
Submitting jobs
In order to run a calculation on our cluster, you must be connected to the login node. This is possible exclusively within the TUBAF IP range, i.e. from a 139.20.xx.xxx IP and with a valid user acount on our infrastructure. For allocating HPC resoureces, you have two options:
- ask for an interactive login with
# salloc -p $PARTITION -n $NCORES srun --pty /bin/bash
where you have insert a valid partition name and choose a valid number of cores - submit a job script with
# sbatch jobscript
Job scripts contain information on the nodes you want to allocate, the maximum time you allow your job to run and settings for job name and email notifications. A simple MPI-only job script looks like
example slurm job file
For hybrid (MPI+openMP) jobs, make sure you do not oversubscribe nodes, i.e. use more ranks/threads in combination than the total number of cpu cores on a node. The number of openMP threads has to be set via the --cpus-per-task option.
Note, that we must use mpirun NOT srun and do not need to specify the number of ranks via 'mpirun -np N'. Slurm will handle this automagically using the specifications from --ntasks-per-node and --nodes parameters.
For a more detailed explanation, please consult the ZIH wiki:
- https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Slurm
Compiling and software access
TODO: introduce lmod system [coming soon...]
Architecture-specific compiler flags / vectorization
In case you want to run your jobs only on a certain partition, e.g. only on chrom, you should use architecture-specific optimization flags for gcc. For our example, we can utilize the AVX instruction set of the AMD opteron CPUs via -march=bdver1 -avx, resulting in a performance gain of e.g. ~ 17% for Elk TD-DFT calculations! You can find the correct flags using the listnodes -l command.
Software recommendations
Elk-LAPW
The current version 5.2.14 cannot be run hardware-spanning and MPI-only, e.g. 40 ranks on calcium[01-02]. The best scaling that can be reached on our cluster is by using 1 MPI rank per socket and number of threads = number of cores per CPU. This seems to work also for multiple nodes.
Hybrid example options for 2 'chromium' nodes with 1 rank per socket and 12 cores/CPU (=24 cores/node):
hybrid job script (excerpt)
CUDA capable nodes
On CUDA devices, you will find the nvcc compiler and the nvidia-cuda-toolkit installed. You may use it just like a conventional compiler, e.g. 'nvcc cudasrc.cpp -o cuda.out'.
Titan Black @ Sirius