Running multiple MD tasks on a GPU workstation
In this example, we are going to show how to run multiple MD tasks on a GPU workstation. This workstation does not install any job scheduling packages installed, so we will use Shell
as batch_type.
1{
2 "batch_type": "Shell",
3 "local_root": "./",
4 "remote_root": "/data2/jinzhe/dpgen_workdir",
5 "clean_asynchronously": true,
6 "context_type": "SSHContext",
7 "remote_profile": {
8 "hostname": "mandu.iqb.rutgers.edu",
9 "username": "jz748",
10 "port": 22
11 }
12}
The workstation has 48 cores of CPUs and 8 RTX3090 cards. Here we hope each card runs 6 tasks at the same time, as each task does not consume too many GPU resources. Thus, strategy/if_cuda_multi_devices is set to true
and para_deg is set to 6.
1{
2 "number_node": 1,
3 "cpu_per_node": 48,
4 "gpu_per_node": 8,
5 "queue_name": "shell",
6 "group_size": 0,
7 "strategy": {
8 "if_cuda_multi_devices": true
9 },
10 "source_list": [
11 "activate /home/jz748/deepmd-kit"
12 ],
13 "envs": {
14 "OMP_NUM_THREADS": 1,
15 "TF_INTRA_OP_PARALLELISM_THREADS": 1,
16 "TF_INTER_OP_PARALLELISM_THREADS": 1
17 },
18 "para_deg": 6
19}
Note that group_size should be set to 0
(means infinity) to ensure there is only one job and avoid running multiple jobs at the same time.