noVNC- failed to connect to server

Hello,

We are having some issues setting up interactive desktops. I found a similar discussion here, that doesn’t appear to be our solution though: noVNC: Failed to connect to server

Issue is the same. We get “noVNC: Failed to connect to server” on the client side.

This is the output.log:


Desktop 'TurboVNC: node049:2 (first.last)' started on display node049:2

Log file is vnc.log
Successfully started VNC server on :5902...
Script starting...
Starting websocket server...
Resetting modules to system default. Reseting $MODULEPATH back to system default. All extra directories will be removed from $MODULEPATH.
Launching desktop 'xfce'...
/usr/local/lib/python3.6/site-packages/websockify-0.10.0-py3.6.egg/websockify/websocket.py:31: UserWarning: no 'numpy' module, HyBi protocol will be slower
  warnings.warn("no 'numpy' module, HyBi protocol will be slower")
WebSocket server settings:
  - Listen on :35036
  - No SSL/TLS support (no cert file)
  - Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall
generating cookie with syscall

And the vnc.log:

TurboVNC Server (Xvnc) 64-bit v2.2.5 (build 20200507)
Copyright (C) 1999-2020 The VirtualGL Project and many others (see README.txt)
Visit http://www.TurboVNC.org for more information on TurboVNC

30/11/2022 16:19:24 Using security configuration file /etc/turbovncserver-security.conf
30/11/2022 16:19:24 Enabled security type 'tlsvnc'
30/11/2022 16:19:24 Enabled security type 'tlsotp'
30/11/2022 16:19:24 Enabled security type 'tlsplain'
30/11/2022 16:19:24 Enabled security type 'x509vnc'
30/11/2022 16:19:24 Enabled security type 'x509otp'
30/11/2022 16:19:24 Enabled security type 'x509plain'
30/11/2022 16:19:24 Enabled security type 'vnc'
30/11/2022 16:19:24 Enabled security type 'otp'
30/11/2022 16:19:24 Enabled security type 'unixlogin'
30/11/2022 16:19:24 Enabled security type 'plain'
30/11/2022 16:19:24 Desktop name 'TurboVNC: node049:2 (first.last)' (node049:2)
30/11/2022 16:19:24 Protocol versions supported: 3.3, 3.7, 3.8, 3.7t, 3.8t
30/11/2022 16:19:24 Listening for VNC connections on TCP port 5902
30/11/2022 16:19:24   Interface 0.0.0.0
30/11/2022 16:19:24 Framebuffer: BGRX 8/8/8/8
30/11/2022 16:19:24 New desktop size: 800 x 600
30/11/2022 16:19:24 New screen layout:
30/11/2022 16:19:24   0x00000040 (output 0x00000040): 800x600+0+0
30/11/2022 16:19:24 Maximum clipboard transfer size: 1048576 bytes
30/11/2022 16:19:24 VNC extension running!

(xfce4-session:26779): xfce4-session-WARNING **: 16:19:38.941: xfsm_manager_load_session: Something wrong with /home/first.last/.cache/sessions/xfce4-session-node049:2, Does it exist? Permissions issue?

(xfwm4:26807): xfwm4-WARNING **: 16:19:38.986: Error opening /dev/dri/card0: Permission denied

We have ood_portal.yml set up correctly to my knowledge (/rnode etc), we have Jupyter and RStudio apps working without issue.

This is what I see in the console:

The websockify/VNC proccesses start on the compute node without issues as well. Not sure what we might be missing. Any advice is appreciated.

Hey sorry for the trouble. Would it be possible to see the clusters.d/<cluster_name>.yml for this as well? The permission issues at the bottom of the vnc.log are not expected to be there and I’d like to see how the app is configured for vnc to submit to its cluster.

Sure, thank you for the help:

---
v2:
  metadata:
    title: "RT"
  login:
    host: "hostname.university.edu"
  job:
    adapter: "slurm"
    host: "hostname.university.edu"
    cluster: "rt_slurm"
    bin: "/usr/bin"
    #conf: "/etc/slurm"
    bin_overrides:
        sbatch: "/etc/ood/config/wrappers/rt/bin/sbatch"
        squeue: "/etc/ood/config/wrappers/rt/bin/squeue"
        scontrol: "/etc/ood/config/wrappers/rt/bin/scontrol"
        scancel: "/etc/ood/config/wrappers/rt/bin/scancel"
  batch_connect:
    basic:
        script_wrapper: |
          module purge
          %s
        set_host: "host=$(hostname -A | awk -F . '{print $1}')"
    vnc:
        script_wrapper: |
          module purge
          export PATH="/opt/TurboVNC/bin:$PATH"
          export WEBSOCKIFY_CMD="/usr/local/bin/websockify"
          %s
        set_host: "host=$(hostname -A | awk -F .'{print $1}')"

Thanks for the info. Nothing here is jumping out at me as looking off.

What about the ood_portal.yml section especially around how the proxy is setup currently.

This is all we have in ood_portal.yml currently:

servername: hostname.university.edu

ssl:
  - 'SSLCertificateFile "/admin/ondemand.crt"'
  - 'SSLCertificateKeyFile "/admin/ondemand.key"'
  - 'SSLCertificateChainFile "/admin/DigiCertCA.crt"'

lua_log_level: 'debug'

host_regex: '(node|big-mem|gpu)\d+'
node_uri: '/node'
rnode_uri: '/rnode'

auth:
  - 'AuthType Mellon'
  - 'Require valid-user'

user_map_cmd: '/etc/ood/scripts/lowercase_username'

Could you expand the Failed when connecting: Connection closed (code: 1006) to see what that error throws? It looks like the client side is having some problems and seeing that will help.

Also, is this a new setup or is this a change that was made?

Sure, this is it expanded:

Failed when connecting: Connection closed (code: 1006) rfb.js:721:21
    _fail https://ondemand.university.edu/pun/sys/dashboard/noVNC-1.1.0/core/rfb.js:721
    RFB https://ondemand.university.edu/pun/sys/dashboard/noVNC-1.1.0/core/rfb.js:233
    onclose https://ondemand.university.edu/pun/sys/dashboard/noVNC-1.1.0/core/websock.js:200

Our OnDemand setup isn’t new but this is the first time we’ve attempted to set up apps that require TurboVNC etc. We currently have Jupyter Notebook/Jupyter Lab/RStudio running without any issues. We have the prerequisites for the interactive desktops set up on one node, and we are currently testing this with the -w option in Slurm to direct the interactive desktops to only start on that node for the time being.

I can’t see anything off in the configs and those errors don’t mean much to me at the moment.

I’ll have to investigate more on my side because I’m not sure what is going wrong here or why this wouldn’t currently work.

You can see from your console image you’re trying to connect to a URL with no host in it.

It should be /rnode/<host>/<port>/websockify – but as you can see you only have the portion, the portion is empty.

This is the setting to determines the hostname. The expression hostname -A | awk -F . '{print $1}' appears to be working as expected on my system but must be somehow wrong on yours - it appears to be outputting an empty string on your system.

When the job runs, you can inspect this functionality in job_script_content.sh. There you’ll see something like

# Set host of current machine
host=$(hostname -A | awk -F .'{print $1}')

# removed for brevity

# Create the connection yaml file
create_yml

We’re setting the host environment variable from this expression, then writing it to a YAML file (connection.yml in the same directory) where the dashboard can then read it and generate a URL (the URL you see in the first image you’ve given here).

Aha - that is exactly it. Thank you. If I go into the noVNC settings I see this:

Capture

And if I manually enter the node name where that space is, it works.

This is strange because if I test that expression on one of the nodes, it returns the hostname. So I’m stuck on why we’re having issues with it returning as blank. I’ll do some digging.

Simple solution - we forgot to add that expression to submit.yml.erb! Sorry about that! I knew we had missed something… Thank you both for the direction.

Setting it in the cluster.d file as you have it above should have been fine. Setting it on the cluster.d file means it’s global for any job that runs on that cluster.

Which is to say - the setting in cluster.d should have worked for you. I don’t see any issue with YAML in your cluster.d file, so I can only assume it’s an issue of updating & reading the file (i.e., you made the updates but the app didn’t restart and re-read the file?)

In any case setting in the cluster.d as you have it should have worked (should have!) and is likely the better option - just because it’s global and you won’t need to set it again on every individual app.

Jeff, one more question on this. I feel like this is pretty novice, but in what file can we tell Matlab to look for a non-default location of websockify? Or where it should point to find websockify? We had this going on one compute node for testing, now are deploying it to the rest of the nodes and having a bit of trouble with it in a non-default location. I appreciate the help.

You should apply the config to the entire cluster, so that every VNC applicaton can pick it up. I left the script_wrapper portion here just to expand this example, you may already have a script_wrapper. The important bit is v2.batch_connect.vnc.websockify_cmd.

# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
  metadata:
    # ...
  job:
    # ...
  batch_connect:
    basic:
      script_wrapper: "module restore\n%s"
    vnc:
      script_wrapper: "module restore\n%s"
      websockify_cmd: "/opt/websockify/run"

That said - OOD will also respond to the environment variable $WEBSOCKIFY_CMD. So if you do have some module(s) that you load, you could set it in that module, but that’s easy to forget about because it’s embedded somewhere and it’s implicit that it gets loaded when you look at say a cluster.d file.