Slurm and Enroot

1 minute read

Finally touching on Slurm system. First heard about during CGG time, and we had some brief discussing of using it for cluster jobs. But our own implemention of MPI base cluster job is efficient enough, and actually it’s actually a self serviced cloud architecture, with remote compute, storage, and all orchestrations tools. So I always have an impression that MPI is better than Slurm.

1 Slurm basic

Apparently the ACCOUNT and PARTITION are important ids in Slurm.

  • sinfo -s check node status
  • squeue --me check the job queue
  • scontrol show job job_id show job details

2 Enroot

Enroot is a tool to run docker without docker. Details of cmd is here

  1. Credential Enroot cred is set at ~/.config/enroot/.credentials file
    machine gitlab-master.nvidia.com login kylhuang password glpat-xxxxx
    machine nvcr.io login $oauthtoken password nvapi-yyyy
    
  2. Run Docker images with Enroot. This need to be done in the srun interactive job environment, which is after running the script 1 in the next job submissino session
    # Create the squash file.
    enroot import --output ${PATH_TO_IMAGE}/phi3-mini.sqsh -- docker://nvcr.io/nim/microsoft/phi-3-mini-4k-instruct:latest
    # Create enroot image with the squash file, under /tmp/enroot-data/user-2001072735/phi3-mini-container 
    enroot create --name phi3-mini-container ${PATH_TO_IMAGE}/phi3-mini.sqsh
    # Run the container
    enroot start phi3-mini-container
    enroot list
    

    If you run enroot without srun interactive job, the import job will fail with enroot-aufs2ovlfs: failed to set capabilities: Operation not permitted errors, and the create will be created under ~/.local/share/enroot/phi3-mini-container and enroot start would fail.

3 Job submission

  1. A job creating an interactive environment ```sh export ACCOUNT=general_sa export JOBNAME=job1

srun -A ${ACCOUNT}
-N1
-J ${JOBNAME}
–partition=interactive
–time=1:00:00 –exclusive
–pty /bin/bash -l

2. A job running a container
```sh
srun -A ${ACCOUNT} \
     -N1 -p interactive \
     -J ${JOBNAME} \
     --container-image=${LOCAL_NIM_CACHE}/phi3-mini.sqsh \
     --container-mounts="${LOCAL_NIM_CACHE}:/opt/nim/.cache:rw" \
     --container-env=NGC_API_KEY \
     --export=ALL --mpi=pmix \
     bash   

4 Job status

You can see job details by scontrol show job job_id:

  • PD (pending)
  • R (running)
  • CA (cancelled)
  • CF(configuring)
  • CG (completing) CD (completed)
  • F (failed), TO (timeout), NF (node failure), RV (revoked) and SE (special exit state)

Tags:

Categories:

Updated: