We are having some issues setting up interactive desktops. I found a similar discussion here, that doesn’t appear to be our solution though: noVNC: Failed to connect to server
Issue is the same. We get “noVNC: Failed to connect to server” on the client side.
This is the output.log:
Desktop 'TurboVNC: node049:2 (first.last)' started on display node049:2
Log file is vnc.log
Successfully started VNC server on :5902...
Script starting...
Starting websocket server...
Resetting modules to system default. Reseting $MODULEPATH back to system default. All extra directories will be removed from $MODULEPATH.
Launching desktop 'xfce'...
/usr/local/lib/python3.6/site-packages/websockify-0.10.0-py3.6.egg/websockify/websocket.py:31: UserWarning: no 'numpy' module, HyBi protocol will be slower
warnings.warn("no 'numpy' module, HyBi protocol will be slower")
WebSocket server settings:
- Listen on :35036
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
And the vnc.log:
TurboVNC Server (Xvnc) 64-bit v2.2.5 (build 20200507)
Copyright (C) 1999-2020 The VirtualGL Project and many others (see README.txt)
Visit http://www.TurboVNC.org for more information on TurboVNC
30/11/2022 16:19:24 Using security configuration file /etc/turbovncserver-security.conf
30/11/2022 16:19:24 Enabled security type 'tlsvnc'
30/11/2022 16:19:24 Enabled security type 'tlsotp'
30/11/2022 16:19:24 Enabled security type 'tlsplain'
30/11/2022 16:19:24 Enabled security type 'x509vnc'
30/11/2022 16:19:24 Enabled security type 'x509otp'
30/11/2022 16:19:24 Enabled security type 'x509plain'
30/11/2022 16:19:24 Enabled security type 'vnc'
30/11/2022 16:19:24 Enabled security type 'otp'
30/11/2022 16:19:24 Enabled security type 'unixlogin'
30/11/2022 16:19:24 Enabled security type 'plain'
30/11/2022 16:19:24 Desktop name 'TurboVNC: node049:2 (first.last)' (node049:2)
30/11/2022 16:19:24 Protocol versions supported: 3.3, 3.7, 3.8, 3.7t, 3.8t
30/11/2022 16:19:24 Listening for VNC connections on TCP port 5902
30/11/2022 16:19:24 Interface 0.0.0.0
30/11/2022 16:19:24 Framebuffer: BGRX 8/8/8/8
30/11/2022 16:19:24 New desktop size: 800 x 600
30/11/2022 16:19:24 New screen layout:
30/11/2022 16:19:24 0x00000040 (output 0x00000040): 800x600+0+0
30/11/2022 16:19:24 Maximum clipboard transfer size: 1048576 bytes
30/11/2022 16:19:24 VNC extension running!
(xfce4-session:26779): xfce4-session-WARNING **: 16:19:38.941: xfsm_manager_load_session: Something wrong with /home/first.last/.cache/sessions/xfce4-session-node049:2, Does it exist? Permissions issue?
(xfwm4:26807): xfwm4-WARNING **: 16:19:38.986: Error opening /dev/dri/card0: Permission denied
We have ood_portal.yml set up correctly to my knowledge (/rnode etc), we have Jupyter and RStudio apps working without issue.
Hey sorry for the trouble. Would it be possible to see the clusters.d/<cluster_name>.yml for this as well? The permission issues at the bottom of the vnc.log are not expected to be there and I’d like to see how the app is configured for vnc to submit to its cluster.
Could you expand the Failed when connecting: Connection closed (code: 1006) to see what that error throws? It looks like the client side is having some problems and seeing that will help.
Also, is this a new setup or is this a change that was made?
Our OnDemand setup isn’t new but this is the first time we’ve attempted to set up apps that require TurboVNC etc. We currently have Jupyter Notebook/Jupyter Lab/RStudio running without any issues. We have the prerequisites for the interactive desktops set up on one node, and we are currently testing this with the -w option in Slurm to direct the interactive desktops to only start on that node for the time being.
You can see from your console image you’re trying to connect to a URL with no host in it.
It should be /rnode/<host>/<port>/websockify – but as you can see you only have the portion, the portion is empty.
This is the setting to determines the hostname. The expression hostname -A | awk -F . '{print $1}' appears to be working as expected on my system but must be somehow wrong on yours - it appears to be outputting an empty string on your system.
When the job runs, you can inspect this functionality in job_script_content.sh. There you’ll see something like
# Set host of current machine
host=$(hostname -A | awk -F .'{print $1}')
# removed for brevity
# Create the connection yaml file
create_yml
We’re setting the host environment variable from this expression, then writing it to a YAML file (connection.yml in the same directory) where the dashboard can then read it and generate a URL (the URL you see in the first image you’ve given here).
Aha - that is exactly it. Thank you. If I go into the noVNC settings I see this:
And if I manually enter the node name where that space is, it works.
This is strange because if I test that expression on one of the nodes, it returns the hostname. So I’m stuck on why we’re having issues with it returning as blank. I’ll do some digging.
Simple solution - we forgot to add that expression to submit.yml.erb! Sorry about that! I knew we had missed something… Thank you both for the direction.
Setting it in the cluster.d file as you have it above should have been fine. Setting it on the cluster.d file means it’s global for any job that runs on that cluster.
Which is to say - the setting in cluster.d should have worked for you. I don’t see any issue with YAML in your cluster.d file, so I can only assume it’s an issue of updating & reading the file (i.e., you made the updates but the app didn’t restart and re-read the file?)
In any case setting in the cluster.d as you have it should have worked (should have!) and is likely the better option - just because it’s global and you won’t need to set it again on every individual app.
Jeff, one more question on this. I feel like this is pretty novice, but in what file can we tell Matlab to look for a non-default location of websockify? Or where it should point to find websockify? We had this going on one compute node for testing, now are deploying it to the rest of the nodes and having a bit of trouble with it in a non-default location. I appreciate the help.
You should apply the config to the entire cluster, so that every VNC applicaton can pick it up. I left the script_wrapper portion here just to expand this example, you may already have a script_wrapper. The important bit is v2.batch_connect.vnc.websockify_cmd.
That said - OOD will also respond to the environment variable $WEBSOCKIFY_CMD. So if you do have some module(s) that you load, you could set it in that module, but that’s easy to forget about because it’s embedded somewhere and it’s implicit that it gets loaded when you look at say a cluster.d file.