Tip for anyone finding this thread: Using the latest VirtualGL (which is still sort-of in beta) 2.6.90 (3.0 beta1)
you can use the EGL backend.
My slurm taskprolog sets
if 'CUDA_VISIBLE_DEVICES' in os.environ:
# Most reliable way to find the dri device:
gpu_bus_id = commands.getoutput('nvidia-smi --query-gpu=gpu_bus_id --format=csv,noheader').split('\n')[0]
# Need to strip off first 4 zeros
dri_device = commands.getoutput('ls -d /sys/bus/pci/devices/{0}/drm/card*'.format(gpu_bus_id[4:])).split('/')[-1]
print('export VGL_DISPLAY=/dev/dri/{0}'.format(dri_device))
In addition, you need to set the render device permissions. Since the nvidia device should be locked down to the right GPU anyway, I think it’s safe to set
# cat /etc/udev/rules.d/99-virtualgl-dri.rules
KERNEL=="renderD*", MODE="0666", OWNER="root", GROUP="root"
No need to set up an X server. VirtualGL will run on the correct GPU. Works for all GLX apps I’ve tested (matlab, mathematica, glxgears, etc)