Hello,
I was able to integrated Grafana usage graphs into our Open OnDemand using the instructions provided in the documentation. In the documentation it lists the ‘cpu’ and ‘memory’ panels. Is there a way to get other panels to display. We also capture gpu and gpu memory usage which we would like to also display (obviously blank/empty graphs if no gpu was requested … that’s fine). I could find if there were ‘cpu’ and ‘memory’ are the only panel YAML keys available.
An OSC colleague commented on the same github ticket that tracking GPU usage from Slurm and getting that into Prometheus is a bit of a challenge. Can you detail how you do the same?
I’m using this Prometheus exporter: https://github.com/plazonic/nvidia_gpu_prometheus_exporter
In addition to running the exporter, I needed these scripts in our SLURM prolog.d and epilog.d directories, respectively. Discussion and examples of these can be found in the GitHub for the jobstats project from our neighbors down the road at Princeton University. https://github.com/PrincetonUniversity/jobstats
[root@gpu-node001 prolog.d]# cat /etc/slurm/prolog.d/gpustats_helper_prolog.sh
#!/bin/bash
[ -z $CUDA_VISIBLE_DEVICES ] && exit 0
DEST=/run/gpustat
[ -e $DEST ] || mkdir -m 755 $DEST
for i in ${GPU_DEVICE_ORDINAL//,/ } ${CUDA_VISIBLE_DEVICES//,/ }; do
echo $SLURM_JOB_ID $SLURM_JOB_UID > $DEST/$i
done
exit 0
Thanks for the additional information! I’ve updated the github ticket for the same. Yes I’m guessing you want the GPU Utilization panel to show up in OnDemand.
I’m guessing you want the
GPU Utilizationpanel to show up in OnDemand.
Yes. GPU Memory Utilization too would be nice … basically mirroring what we can display now for CPU and memory.
