One of my users reported that clicking on the button with the nodename, shown below, doesn’t work:
I didn’t realize that was there, so I hadn’t tested it, but they are right. I get the following error:
“Failed to establish a websocket connection. Be sure you are using a browser that supports websocket connections.”
I’m aware that Safari doesn’t support this, but it doesn’t work in Firefox or Chrome either. We run the Desktop app via a Singularity container, if that makes a difference.
How can I go about troubleshooting this? I’m not sure what this function is called (hence the clunky title), so I’m not sure if there are any docs.
Does your site allow folks to ssh into compute nodes? There’s a way to disable this feature altogether, so the button won’t appear (though I’ll have to look it up if you’d like that).
But essentially, it works as you indicate - just uses the shell app to shell into a compute node. My guess is you have some connectivity issues or just don’t allow shelling into compute nodes.
If you want to allow this, you have to enable your compute nodes through the shell allow list. Here’s our allowlist for our compute nodes.
Thanks, Jeff! The issue was that environment variable not being defined. I hadn’t realized it existed. It’s actually not mentioned anywhere in the apps/shell/README.md, or in the docs for the shell app on Github that I can see. The only variable mentioned is DEFAULT_SSHHOST. Is there a doc somewhere that might explain the difference between DEFAULT_SSHHOST and OOD_DEFAULT_SSHHOST, for example?
At our site, anyone who’s got a job running on a compute node can SSH from the login node to that compute node (uses the PAM module that comes with SLURM to control that), which would be satisfied by a running desktop. This does seem to work as expected with the right allow list.
Kind of would be nice if this could accept the same list as /etc/ood/config/ood_portal.yml's host_regex parameter, or even inherit that list, since it seems likely that you’d want it to default to allowing SSH to the same hosts, if you want to enable SSH at all.
This has somehow stopped working for me again, despite having that environment variable defined and this having been working before. Is there anything else one can use to troubleshoot? I’m getting the same error as the original. Just in case I wrote a bad allow list, I put only the host I’m trying to connect to in there (and did touch tmp/restart.txt in the appropriate place). I don’t see any log files anywhere. I confirmed I can freely SSH between the OOD server and the compute node in question.
I think the best practice for restarting is ‘Restart Web Server’ link in the help menu. Don’t know where you touch tmp.restart.txt but that implies you’re restarting your development version of the dashboard. Is this where you’re working? Touching the system restart file (I’d have to lookup where that is) will restart everyone’s dashboard.
I’m also not sure if we log any of our denials - which is something we should remedy.
It’s a test system, so it doesn’t really matter, but I’ve instead tried “Restart Web Server” and am having the same problem. I even took the node name from the URL that launches the shell app and searched the /etc/ood/config/apps/shell/env file to make sure I had no typo. I currently only have the node I have a session on defined in the file to eliminate any chance of a more complicated error.
Is there any debugging I could enable to see what’s going on?
Unfortunately, I’m not sure if there’s an easy way to see this info unless you hack the shell app.
I think first confirm it’s indeed an issue with the allowlist. You should get 401 with this error message if that’s indeed what’s happening now. (I pull this from the network tab of my browser).
Mine still doesn’t look exactly like yours (maybe a version difference), and while I did find response headers on the second item in the list on the left, there are no response headers for the URL with wss:// – you can see there’s no response tab there where there is for this one.