I’m currently unable to resolve the following problem. When I try to use the remote desktop, everything seems to work fine (access to node, loading the modules for turboVNC, websockify and nmap-ncat) until the step GET wss://…/rnode/…/websockify. The status code is 404.
I’m guessing I am missing something with the rnode part, so far I have (in the ood_portal.yml):
node_uri: ‘/node’
rnode_uri: ‘/rnode’
host_regex: ‘[^/]+’
and in the ood_portal.conf which is generated after running the ansible role:
<LocationMatch “^/rnode/(?[^/]+)/(?\d+)(?/.*|)”>
AuthType Basic
AuthName “private”
AuthUserFile “/etc/apache2/.htpasswd”
RequestHeader unset Authorization
Require valid-user
I don’t know what I am missing, should I modify something since websockify is not directly installed but accessible through module load? The necessary module loads are in /opt/ood/gems/gems/ood_core-0.23.5/lib/ood_core/batch_connect/template.rb in the base_script template.
I think the host_regex is likely your culprit. You need to modify that to pull what is needed, but currently that regex you have doesn’t seem to match anything that would be of interest from a url type string.
Use something like regex101 to hone in on that regex and make sure it’s extracting the name as you expect.
Thank you! I’ll look further into this and come back to say how it went.
I thought this was ok since our nodes have simple hostnames like kyle03 for example, and the url was looking ok too (wss://dev.dce-cs.fr/rnode/kyle03/41226/websockify).
Since I am able to connect to the vncserver if I bypass websockify, I guess the problem is with websockify or the reverse proxy. To be more precise, when I say I bypass websockify, here is what I am doing:
connection to the ood portal
running remote desktop
when I have the compute node accessible and the error on the novnc page, I run on another computer turbovnc with the node port and the password, and I get the desktop running as it should be in the web browser.
For example, here is the output.log of a session I just tried
Setting VNC password...
Starting VNC server...
Desktop 'TurboVNC: kyle24:1 (gillard)' started on display kyle24:1
Log file is vnc.log
Successfully started VNC server on kyle24:5901...
Script starting...
Starting websocket server...
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules
Launching desktop 'xfce'...
/usr/bin/iceauth: creating new authority file /run/user/35714/ICEauthority
(xfwm4:12684): xfwm4-WARNING **: 09:09:44.125: Unsupported GL renderer (llvmpipe (LLVM 15.0.7, 256 bits)).
WebSocket server settings:
- Listen on :18695
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
(polkit-gnome-authentication-agent-1:12746): polkit-gnome-1-WARNING **: 09:09:45.652: Unable to determine the session we are in: No session for pid 12746
** (xiccd:12745): WARNING **: 09:09:45.816: EDID is empty
** (xfce4-screensaver:12733): WARNING **: 09:09:45.852: screensaver already running in this session
** (xiccd:12745): CRITICAL **: 09:09:45.859: failed to create colord device: failed to obtain org.freedesktop.color-manager.create-device auth
(wrapper-2.0:12799): GLib-GIO-CRITICAL **: 09:09:47.632: g_file_new_for_path: assertion 'path != NULL' failed
(wrapper-2.0:12799): GLib-GIO-CRITICAL **: 09:09:47.632: g_file_monitor_file: assertion 'G_IS_FILE (file)' failed
(wrapper-2.0:12799): GLib-GObject-WARNING **: 09:09:47.632: invalid (NULL) pointer instance
(wrapper-2.0:12799): GLib-GObject-CRITICAL **: 09:09:47.632: g_signal_connect_data: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed
(wrapper-2.0:12799): Gtk-WARNING **: 09:09:47.632: Attempting to add a widget with type GtkToggleButton to a container of type XfcePanelPlugin, but the widget is already inside a container of type XfcePanelPlugin, please remove the widget from its existing container first.
(wrapper-2.0:12799): Gtk-WARNING **: 09:09:47.842: Negative content width -3 (allocation 1, extents 2x2) while allocating gadget (node button, owner GtkToggleButton)
(wrapper-2.0:12798): Gtk-WARNING **: 09:09:47.939: Negative content width -3 (allocation 1, extents 2x2) while allocating gadget (node button, owner PulseaudioButton)
(wrapper-2.0:12798): pulseaudio-plugin-WARNING **: 09:09:50.883: Disconected from the PulseAudio server. Attempting to reconnect in 5 seconds.
Failed to create secure directory (/run/user/35714/pulse): No such file or directory
Failed to create secure directory (/run/user/35714/pulse): No such file or directory
** (xiccd:12745): WARNING **: 09:21:15.551: EDID is empty
Setting VNC password...
Generating connection YAML file...
The last part is when I got in with the other computer turbovnc.
In the log file of websockify on the compute node:
WebSocket server settings:
- Listen on :18695
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
- proxying from :18695 to localhost:5901
I don’t understand why it does not detect ssl, since I have run certbot and the https of the ood portal is running fine. I tried to add directly the path of the ssl files in the ood-portal.yml, but no changes in websockify logs. But maybe it is not related to my problem?
I don’t know if this can help diagnose my problem, but when I connect directly in https on the machine’s IP and on the associated port, the following error appears in the websockify log (I only put one, it’s repeated 10 times)
WebSocket server settings:
- Listen on :26593
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
- proxying from :26593 to localhost:5908
[IP]: SSL connection but '/usr/users/[...]/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/dce/output/984dbfed-9ad2-477e-a11c-b0606f434bc6/self.pem' not found
Since running certbot changes the ood-portal.conf SSL part, I tried with and without it, but I have the same error regardless.
I finally solved my problem: I was looking in the ood-portal.conf file but in fact it was the ood-portal-le-ssl.conf file that didn’t contain the node and rnode parts… For now I just copy-pasted those parts from the former in the later and tadaa! It now works as intended! I still need to recreate properly this ood-portal-le-ssl.conf file, with certbot I guess, I never saw that when I reapplied it this file was not modified. I suppose I am not using the good option for that but it’s just something I need to look into a little more.
Anyway, I’m very happy to finally have the remote desktop up and running!