Hi all, I have recently managed to get an installation of OOD going on my hpc. I have added a desktop app by cloning the “bc_desktop” repo here.
The desktop app successfully launches but I get a “Failed to connect to server” on the noVNC page. The output.log is below:
Setting VNC password...
Starting VNC server...
WARNING: n002.cluster.com:1 is taken because of /tmp/.X1-lock
Remove this file if there is no X server n002.cluster.com:1
Killing Xvnc process ID 158023
Xvnc process ID 158023 already killed
Xvnc did not appear to shut down cleanly. Removing /tmp/.X11-unix/X1
Xvnc did not appear to shut down cleanly. Removing /tmp/.X1-lock
Desktop 'TurboVNC: n002.cluster.com:1 (faizanbadami)' started on display n002.cluster.com:1
Log file is vnc.log
Successfully started VNC server on n002.cluster.com:5901...
Script starting...
Starting websocket server...
/var/spool/slurmd/job31383/slurm_script: line 193: /usr/bin/websockify: No such file or directory
cmdTrace.c(713):ERROR:104: 'restore' is an unrecognized subcommand
cmdModule.c(411):ERROR:104: 'restore' is an unrecognized subcommand
Scanning VNC log file for user authentications...
Generating connection YAML file...
Launching desktop 'xfce'...
dbus[178581]: Unable to set up transient service directory: XDG_RUNTIME_DIR "/run/user/1001" not available: No such file or directory
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
(xfce4-session:178584): xfce4-session-WARNING **: 15:49:25.681: xfsm_manager_load_session: Something wrong with /home/faizanbadami/.cache/sessions/xfce4-session-n002.cluster.com:1, Does it exist? Permissions issue?
(xfwm4:178591): xfwm4-WARNING **: 15:49:25.792: Error opening /dev/dri/card0: No such file or directory
SELinux Troubleshooter: Applet requires SELinux be enabled to run.
vmware-user: could not open /proc/fs/vmblock/dev
/usr/share/system-config-printer/applet.py:44: PyGIWarning: Notify was imported without specifying a version first. Use gi.require_version('Notify', '0.7') before import to ensure that the right version gets loaded.
from gi.repository import Notify
system-config-printer-applet: failed to start NewPrinterNotification service
system-config-printer-applet: failed to start PrinterDriversInstaller service: org.freedesktop.DBus.Error.AccessDenied: Connection ":1.109822" is not allowed to own the service "com.redhat.PrinterDriversInstaller" due to security policies in the configuration file
I tried the solutions recommended here, here and here but wasnt able to solve the issue.
Please see attached my websocket settings and my ood_portal.yml.
These 2 errors stick out to me. First you can’t seem to find websockify. And secondly your XDG_RUNTIME_DIR doesn’t seem to be valid.
/var/spool/slurmd/job31383/slurm_script: line 193: /usr/bin/websockify: No such file or directory
dbus[178581]: Unable to set up transient service directory: XDG_RUNTIME_DIR "/run/user/1001" not available: No such file or directory
Thank you for the quick response. I was able to correct the websockify error. For the XDG error:
dbus[76297]: Unable to set up transient service directory: XDG_RUNTIME_DIR "/run/user/1001" not available: No such file or directory
I added the recommended export command to the submit.yml.erb under /etc/ood/config/apps/bc_desktop/submit and added the submit: submit/submit.yml.erb in my hpc_altoneuro.yml under /etc/ood/config/apps/bc_desktop/
This causes a different error and the job never submits:
#<LoadError: Could not load 'vnc export XDG_RUNTIME_DIR="$TMPDIR/xdg_runtime"'. Make sure that that batch connect template in the configuration file is valid.>
So I had to manually make a /tmp/xdg_runtime directory for the error to go away (of course after making the change you recommended). Not sure if thats the right thing to do here?
Both the websockify and XDG errors dont show up in the output.log anymore but I still cannot connect to the desktop.
Setting VNC password...
Starting VNC server...
WARNING: n001.cluster.com:1 is taken because of /tmp/.X1-lock
Remove this file if there is no X server n001.cluster.com:1
Killing Xvnc process ID 10270
Xvnc process ID 10270 already killed
Xvnc did not appear to shut down cleanly. Removing /tmp/.X11-unix/X1
Xvnc did not appear to shut down cleanly. Removing /tmp/.X1-lock
Desktop 'TurboVNC: n001.cluster.com:1 (faizanbadami)' started on display n001.cluster.com:1
Log file is vnc.log
Successfully started VNC server on n001.cluster.com:5901...
Script starting...
Starting websocket server...
cmdTrace.c(713):ERROR:104: 'restore' is an unrecognized subcommand
cmdModule.c(411):ERROR:104: 'restore' is an unrecognized subcommand
Launching desktop 'xfce'...
WebSocket server settings:
- Listen on :54400
- Flash security policy server
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
(xfce4-session:13590): xfce4-session-WARNING **: 11:59:14.617: xfsm_manager_load_session: Something wrong with /home/faizanbadami/.cache/sessions/xfce4-session-n001.cluster.com:1, Does it exist? Permissions issue?
(xfwm4:13597): xfwm4-WARNING **: 11:59:14.700: Error opening /dev/dri/card0: No such file or directory
SELinux Troubleshooter: Applet requires SELinux be enabled to run.
vmware-user: could not open /proc/fs/vmblock/dev
/usr/share/system-config-printer/applet.py:44: PyGIWarning: Notify was imported without specifying a version first. Use gi.require_version('Notify', '0.7') before import to ensure that the right version gets loaded.
from gi.repository import Notify
system-config-printer-applet: failed to start NewPrinterNotification service
system-config-printer-applet: failed to start PrinterDriversInstaller service: org.freedesktop.DBus.Error.AccessDenied: Connection ":1.110814" is not allowed to own the service "com.redhat.PrinterDriversInstaller" due to security policies in the configuration file
I think that could be it. I believe that the vnc library here is requiring a secure connection. Do you have any error messages out of your browsers console? (open developer tools with F12 and navigate to the console tab)
No SSL yet. Whenever I’ve tried to add the ssl cert and key to my ood_portal I have not been able to get httpd to restart. See the ood portal above lines 32, 33 and 34 once uncommented dont let httpd restart. The cert and key are from where the domain is registered. Any advice on how to get https going?