RE: Slurm squeue: see number of gpus
To see the number of GPUs allocated to your jobs in Slurm's queue, you will need to use the `squeue` command along with some formatting options to display the GPU count. Here's what you can do:
1. Standard `squeue` with no formatting related to GPUs:
```bash
squeue -u $USER
```
2. To see the number of GPUs requested, you can format the output using the `--Format` or `-o` option:
```bash
squeue -u $USER -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %C %G"
```
- `%i`: Job ID
- `%P`: Partition
- `%j`: Job name
- `%u`: User name
- `%t`: Job state
- `%M`: Time running
- `%D`: Number of nodes
- `%C`: Number of CPUs
- `%G`: Generic resource (GRES) count
3. If your Slurm configuration uses `GRES` (Generic Resource Scheduling), which includes GPUs:
```bash
squeue -u $USER --format="%i %t %G"
```
This will output the job ID, job state, and generic resource count in a custom format, where `%G` should show the GPU allocation details.
4. Alternatively, you can check the output of `scontrol show job JOB_ID` for specific information about a given job, including GPU allocation:
```bash
scontrol show job JOB_ID
```
Replace `JOB_ID` with your actual job ID number. Then look for the `TresPerNode` or `GRES` field in the output, which should list GPU resources if they are allocated.
5. If you want to see GPU usage for all jobs, you may need administrator access to run this:
```bash
sinfo --Node --long --Format="NodeHost,Gres:25"
```
Remember that your specific Slurm setup might require different options or may have custom GRES names for GPUs. If you encounter any difficulties or unexpected outputs, consult your cluster's documentation or reach out to your system administrator for guidance, as they can provide configurations specific to your Slurm setup.