Correction. We do have some small progress. The correction in the submit.yml has now passed the GPU params to the slurm job. But still getting permissions issues
I am in the vglusers group:
(base) [chris.welsh@gpu002 ~]$ groups
stuff deleted... jupyterhub_users vglusers
/dev/dri/*
[root@gpu002 dri]# ls -l
total 0
drwxr-xr-x. 2 root root 100 Sep 11 15:40 by-path
crw-rw----. 1 root vglusers 226, 0 Sep 11 15:40 card0
crw-rw----. 1 root vglusers 226, 1 Sep 11 15:40 card1
crw-rw----. 1 root vglusers 226, 128 Sep 11 15:40 renderD128
gres.conf
NodeName=gpu002 Name=gpu Type=a30 File=/dev/nvidia0
/etc/slurm/slurm.conf
GresTypes=gpu,gpu:H100:2,gpu:a30:1,l40s:1
AccountingStorageTRES=gres/gpu,gres/gpu:H100,gres/gpu:a30,gres/gpu:l40s
NodeName=gpu002 NodeAddr=10.deleted CPUs=64 Feature=NUMA,AMD,GPU Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=515024 Gres=gpu:a30:1 Weight=60
[root@gpu002 ~]# nvidia-smi
Sat Sep 13 10:44:38 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A30 Off | 00000000:21:00.0 Off | 0 |
| N/A 31C P0 30W / 165W | 1MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
submit.yml.erb
---
batch_connect:
template: vnc
script:
native:
- "--gpus-per-node=1"
- "--gres=gpu"
- "--ntasks=1"
I can now see GPU stuff below but still no permissions.
[root@login001 ~]# scontrol show job 593703
JobId=593703 JobName=sys/dashboard/dev/bc_desktop
UserId=chris.welsh(37738) GroupId=chris.welsh.dg(41023) MCS_label=N/A
Priority=1 Nice=0 Account=itec1 QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:04:10 TimeLimit=01:00:00 TimeMin=N/A
SubmitTime=2025-09-13T10:53:04 EligibleTime=2025-09-13T10:53:04
AccrueTime=2025-09-13T10:53:04
StartTime=2025-09-13T10:53:06 EndTime=2025-09-13T11:53:06 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-09-13T10:53:06 Scheduler=Main
Partition=GPU_SHORT AllocNode:Sid=172.16.12.16:251917
ReqNodeList=(null) ExcNodeList=(null)
NodeList=gpu002
BatchHost=gpu002
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=1,mem=8G,node=1,billing=1,gres/gpu=1
AllocTRES=cpu=1,mem=8G,node=1,billing=1,gres/gpu=1,gres/gpu:a30=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=8G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/home/chris.welsh/ondemand/data/sys/dashboard/batch_connect/dev/bc_desktop/output/1107da7e-6730-4779-a026-cd69d5cf6a7c
StdErr=/home/chris.welsh/ondemand/data/sys/dashboard/batch_connect/dev/bc_desktop/output/1107da7e-6730-4779-a026-cd69d5cf6a7c/output.log
StdIn=/dev/null
StdOut=/home/chris.welsh/ondemand/data/sys/dashboard/batch_connect/dev/bc_desktop/output/1107da7e-6730-4779-a026-cd69d5cf6a7c/output.log
TresPerNode=gres/gpu:1,gres/gpu
output.log
Desktop 'TurboVNC: gpu002.meerkat.mcri.edu.au:1 (chris.welsh)' started on display gpu002.meerkat.mcri.edu.au:1
Log file is vnc.log
Successfully started VNC server on gpu002.meerkat.mcri.edu.au:5901...
Script starting...
Starting websocket server...
Launching desktop 'xfce'...
[websockify]: pid: 301508 (proxying 30159 ==> localhost:5901)
[websockify]: log file: ./websockify.log
[websockify]: waiting ...
[VGL] Shared memory segment ID for vglconfig: 262205
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
/usr/bin/iceauth: creating new authority file /run/user/37738/ICEauthority
[VGL] Shared memory segment ID for vglconfig: 262206
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
(xfwm4:301559): xfwm4-WARNING **: 10:53:10.634: GLX extension missing, GLX support disabled.
[VGL] Shared memory segment ID for vglconfig: 294914
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294915
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294916
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294917
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294925
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294926
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] Shared memory segment ID for vglconfig: 294927
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294928
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294930
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] Shared memory segment ID for vglconfig: 294931
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294932
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294933
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
ERROR: The current user does not have permission for operation
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
[VGL] Shared memory segment ID for vglconfig: 294934
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[websockify]: started successfully (proxying 30159 ==> localhost:5901)
Scanning VNC log file for user authentications...
Generating connection YAML file...
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
** (wrapper-2.0:301607): WARNING **: 10:53:11.582: No outputs have backlight property
[VGL] Shared memory segment ID for vglconfig: 294940
[VGL] VirtualGL v3.1.3 64-bit (Build 20250409)
[VGL] NOTICE: Replacing dlopen("libGLX.so.1") with dlopen("libvglfaker.so")
[VGL] WARNING: The EGL back end requires a 2D X server with a GLX extension.
(wrapper-2.0:301606): libnotify-WARNING **: 10:53:11.771: Failed to connect to proxy
(wrapper-2.0:301607): Gtk-CRITICAL **: 10:53:11.793: gtk_icon_theme_has_icon: assertion 'icon_name != NULL' failed
(wrapper-2.0:301607): Gtk-CRITICAL **: 10:53:11.802: gtk_icon_theme_has_icon: assertion 'icon_name != NULL' failed
(wrapper-2.0:301607): Gtk-CRITICAL **: 10:53:11.802: gtk_icon_theme_has_icon: assertion 'icon_name != NULL' failed
(wrapper-2.0:301607): Gtk-CRITICAL **: 10:53:11.829: gtk_icon_theme_has_icon: assertion 'icon_name != NULL' failed
(nm-applet:301640): libnotify-WARNING **: 10:53:11.840: Failed to connect to proxy
(nm-applet:301640): nm-applet-WARNING **: 10:53:11.841: Failed to show notification: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Notifications was not provided by any .service files
(nm-applet:301640): nm-applet-WARNING **: 10:53:11.848: Failed to show notification: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Notifications was not provided by any .service files
** (xfdesktop:301587): WARNING **: 10:53:13.026: Failed to register the newly set background with AccountsService '/usr/share/backgrounds/xfce/xfce-leaves.svg': GDBus.Error:org.freedesktop.DBus.Error.InvalidArgs: No such interface “org.freedesktop.DisplayManager.AccountsService”
(wrapper-2.0:301606): pulseaudio-plugin-WARNING **: 10:53:17.982: Disconnected from the PulseAudio server. Attempting to reconnect in 5 seconds...
Failed to create secure directory (/run/user/37738/pulse): No such file or directory
Failed to create secure directory (/run/user/37738/pulse): No such file or directory
Xlib: extension "DPMS" missing on display ":1.0".
Xlib: extension "DPMS" missing on display ":1.0".
Xlib: extension "DPMS" missing on display ":1.0".
Xlib: extension "DPMS" missing on display ":1.0".
But as you can see no permissions in the log still (Above)
Also evedent below.