Connecting to ESRF compute infrastructure¶
- You cannot connect directly to compute nodes (except beamline-dedicated machines).
- The SLURM tasks manager has to be used.
- Once connected, you have to ask for resources (number of CPU cores, GPUs, memory, time, ...)
- Use
salloc
orsbatch
In this training, we'll use interactive SLURM sessions: salloc [...] srun --pty bash
Connection for the training¶
from visa.esrf.fr¶
- connect to visa.esrf.fr
- create a new instance with a GPU
- connect to the instance
using reservation (legacy)¶
Each participant will have half a machine from the p9gpu
partition (IBM AC922: 600 GB Ram, 2X Nvidia Tesla V100 Gpus, 128-cores Power9 CPU)
Connect to a SLURM front-end:
ssh -XC cluster-access
Ask for SLURm resources using the training reservation:
salloc --reservation={tomo4|tomo11} --x11 --time=12:00:00 --mem=128G --cores-per-socket 16 -c 64 --gres=gpu:1 srun --pty bash -l
Activating the tomo-tools software¶
By default, the tomo-tools are not available - you have to enter a command to use it:
using 'modules'¶
module load tomotools/{version}
using '/scisoft/tomotools/activate'¶
source /scisoft/tomotools/activate {version}
The version for the training will be training_2024
. But usually it will be more dev
or stable
Supported ESRF nodes¶
The tomo-tools are installed on the following nodes
General-purpose partitions¶
Partition (machine) | Nodes | CPU & GPU hardware | Cuda information | Notes |
---|---|---|---|---|
p9gpu partition (p9-??) | 14 | Power9, V100 | Cuda 10.1, driver 418.126 | ppc64le architecture |
gpu partition (gpu4-??) | 3 | EPYC 7543, A40 | Cuda 11.6, driver 470.141 | Cuda 11, now needs ‘module load cuda’ |
Beamlines partitions¶
Some of these machines are made available for general use, but priority is given to the beamline operations.
Partition (machine) | Nodes | CPU & GPU hardware | Cuda information | Notes |
---|---|---|---|---|
bm18 (gpbm18-??) | 6 | EPYC 75F3, A40 | Cuda 10.1, driver 418.126 | |
id16a (gpid16a-180?) | 2 | Xeon E5-2670, K20m | Cuda 11.4, driver 470.141 (!) | |
id16axni (gpid16axni-??) | 5 | EPYC 7543, A100 | Cuda 11.4, driver 470.141 | Cuda 11, now needs ‘module load cuda’ |
Beamlines machines ("LBS")¶
These machines are used for on-line data processing/visualization. They were previously named LBS (local buffer storage).
Machine name | Nodes | CPU & GPU hardware | Cuda information | Notes |
---|---|---|---|---|
iccbm051 | 1 | Xeon E5-2697, Quadro K5200, K5c | Cuda 11.4/11.6, driver 470.103 | |
lid16nagpu1 | 1 | Xeon Gold 6154, Titan X | Cuda 11, driver ?? | ID16B LBS |
iccbm18? | 2 | EPYC 75F3, A40 | 11.2, driver 460.91 | Need ‘module load cuda’ |
iccid191 | 1 | EPYC 75F3, A40 | Cuda 11.4/1.6, driver 470.141 | Cuda 11, now needs ‘module load cuda’ |
lbs191 | 1 | Xeon E5-2697, 1080 Ti | Cuda 10.1, driver 470.161.03 |
See also: https://gitlab.esrf.fr/tomotools/nabu/-/wikis/ESRF-machines