Use the interactive desktop to report the following error:
[user01@node004 Desktop]$ glxgears
6465 frames in 5.0 seconds = 1292.977 FPS
4637 frames in 5.0 seconds = 927.256 FPS
X connection to :1.0 broken (explicit kill or server shutdown).
[user01@node004 Desktop]$ vglrun glxgears
Invalid MIT-MAGIC-COOKIE-1 key[VGL] ERROR: Could not open display :0.
Hi, I have to say right off the start, these are very hard issues for us to debug because they’re so specific to your environment.
The first question I’d ask is what scheduler you use and how you’re able to allocate it and start that X server.
Here’s what we do with a Slurm prologue.
if [[ "$SLURM_LOCALID" == "0" && "$SLURM_JOB_GRES" == *"vis"* ]]; then
if [ -n "$CUDA_VISIBLE_DEVICES" ]; then
FIRSTGPU=$(echo $CUDA_VISIBLE_DEVICES | tr ',' "\n" | head -1)
setsid /usr/bin/X :${FIRSTGPU} -noreset >& /dev/null &
sleep 2
if [ -n "$DISPLAY" ]; then
echo "export OLDDISPLAY=$DISPLAY"
fi
echo "export DISPLAY=:$FIRSTGPU"
fi
fi
We initially had some weird vglrun issues as well, but none of them were related to OnDemand. A solution to one of the problems we encountered was specifying the display with “-d”. vglrun -d :x.x glxgears where :x.x is the display numbers. For example, vglrun -d :0.0 glxgears. Sometimes the default display doesn’t work, especially if you have multiple gpus in the system.
Also, to help us troubleshoot this, could you provide some additional information? nvidia-smi --query-gpu=gpu_bus_id --format=csv,noheader and also the full output of nvidia-smi.
This is a user preference, but I also prefer glxspheres64 since it prints fps and is easy to see FPS speedup.
$ /opt/VirtualGL/bin/glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0x163 (8/8/8/0)
Visual ID of window: 0x3f6
Context is Direct
OpenGL Renderer: llvmpipe (LLVM 12.0.1, 256 bits)
44.023927 frames/sec - 38.649486 Mpixels/sec
43.564125 frames/sec - 38.245817 Mpixels/sec
With basic vglrun:
$ vglrun /opt/VirtualGL/bin/glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xdd (8/8/8/0)
Visual ID of window: 0x21
Segmentation fault (core dumped)
but with the display set:
$ vglrun -d :0.2 /opt/VirtualGL/bin/glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0x4cf (8/8/8/0)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Quadro M6000 24GB/PCIe/SSE2
724.313754 frames/sec - 635.889531 Mpixels/sec
746.916392 frames/sec - 655.732839 Mpixels/sec
In this example, I had to use the 3rd display :0.2. This is because this system has 4 GPUs and I was allocated the 3rd gpu. So you may have to test each of your gpus if you’re on a multi-gpu system.
Hi, I’m using slurm scheduler.
I did not add gres parameter to the submit/slurm.yml.erb file of bc_desktop directory before, I will try to add gres.
Are the following parameters the content of a desktops/xfce.sh script?
if [[ "$SLURM_LOCALID" == "0" && "$SLURM_JOB_GRES" == *"vis"* ]]; then
if [ -n "$CUDA_VISIBLE_DEVICES" ]; then
FIRSTGPU=$(echo $CUDA_VISIBLE_DEVICES | tr ',' "\n" | head -1)
setsid /usr/bin/X :${FIRSTGPU} -noreset >& /dev/null &
sleep 2
if [ -n "$DISPLAY" ]; then
echo "export OLDDISPLAY=$DISPLAY"
fi
echo "export DISPLAY=:$FIRSTGPU"
fi
fi