Running multiple MD tasks on a GPU workstation

In this example, we are going to show how to run multiple MD tasks on a GPU workstation. This workstation does not install any job scheduling packages installed, so we will use Shell as batch_type.

{
  "batch_type": "Shell",
  "local_root": "./",
  "remote_root": "/data2/jinzhe/dpgen_workdir",
  "clean_asynchronously": true,
  "context_type": "SSHContext",
  "remote_profile": {
    "hostname": "mandu.iqb.rutgers.edu",
    "username": "jz748",
    "port": 22
  }
}

The workstation has 48 cores of CPUs and 8 RTX3090 cards. Here we hope each card runs 6 tasks at the same time, as each task does not consume too many GPU resources. Thus, strategy/if_cuda_multi_devices is set to true and para_deg is set to 6.

{
  "number_node": 1,
  "cpu_per_node": 48,
  "gpu_per_node": 8,
  "queue_name": "shell",
  "group_size": 0,
  "strategy": {
    "if_cuda_multi_devices": true
  },
  "source_list": [
    "activate /home/jz748/deepmd-kit"
  ],
  "envs": {
    "OMP_NUM_THREADS": 1,
    "TF_INTRA_OP_PARALLELISM_THREADS": 1,
    "TF_INTER_OP_PARALLELISM_THREADS": 1
  },
  "para_deg": 6
}

Note that group_size should be set to 0 (means infinity) to ensure there is only one job and avoid running multiple jobs at the same time.