Regrding launching interactive session through VNC

Hello,

After so many obstacles, I finally managed to install and configure open ondemand and also the interactive session is working fine as well.

Now I have a question about interactive session. I have XMATE, XFCE, TurboVNC, Websockify running on slurm compute (scompute). When I lauch interactive shell, the terminal will launch automatically into slurm control node (slurmctl) where I gave the slurm control node in cluster.yml file. In my set up, the interactive session works only when I do “ssh scompute” (from slurm control node [slurmctl]). But if don do “ssh compute” (from slurmctl) then interactive session is not working.

Based on the above set up, I have a curious question whether the Open Ondemand works typically relies on SSH to start interactive sessions with compute nodes for security and session management reasons?
**
Looking forward to get clear about the Open Ondeamd setup;

Thanks in advance!

Best Regards,
Hariharan

Great question.

It looks like you are asking about vnc apps specifically so I’ll try to write out that flow as it is a bit mysterious to many users and I don’t think we have a good diagram showing this off.

To start any interactive job with OOD you must first be able to ssh to the login-node you set, the flow looks like this (as far as I know):

  • OOD ssh’s to the login node (under login and host in cluster.yml)
  • runs the sbatch command that was generated
  • gets back a job ID from SLURM

At that point, ssh is done. Now SLURM will run that script on a compute node by:

  • starting up TurboVNC on a port
  • starting up Websockify to connect to that port
  • write out the data needed for the connection.yml file
  • once the job writes connection.yml to the user’s home directory, OOD detects it appearing on the shared filesystem. OOD is also using squeue separately to track job state but the “Launch” button appearing in the session card is just the connection.yml file appearing.

So that’s the flow of how this works for a VNC interactive app as far as I know. Hopefully this helps, but let me know if there’s still anything confusing or if I didn’t answer quite the question you were getting at.

Hi Travis,

Many thnaks for your kind response! I am still not clear. Let me explain you clearly.

In my set up, I don have a login node, I have only slurm control node (slurmctl) and slurm compute node (scompute) for slurm cluster setup. we have a nfs shared home which is shared among OOD, slurmctl and scompute nodes. I have configured the ssh keys so it is shared among all these nodes. When I lauch cluster Shell Access (under cluster) it will login (passwordless) onto slurm control node (slurmctl). So then, I have to “ssh scompute” in order to launch the interactive session. If I launch the interative session without “ssh scompute” then VNC failed which I don understand. Is this the behaviour that open ondemand has? WOuld it be posisble to clear this based on my set up?

Looking forward to hearing from you;

Thanks once again!

Best Regards,
Hariharan

Ok, I think I’m understanding this better. Also just FYI, this is more a
Get Help than General Discussion :slight_smile:

Your setup is not standard (typically OOD uses a dedicated login node separate from the Slurm controller) but I think it should still work with some changes. Side note, It’s worth considering adding a proper login node later because having users and OOD SSH directly into your Slurm controller is not ideal for stability or security.

For now, your VNC issue is probably a hostname resolution problem, not a Slurm problem. Your compute node is probably reporting one hostname in connection.yml that the OOD server can’t resolve.

Possible fix is to add this to your script for the job in template/script.sh.erb:

export host=$(hostname -f)                                                                                                        

This forces the compute node to report its fully-qualified hostname. Then check your connection.yml after a failed session to confirm what hostname OOD is trying to reach, and make sure that hostname resolves from the OOD server. It’s a game of just making sure OOD knows where that hostname resolves and checking what OOD thinks it is in that file and seeing if you can get there on from the OOD server.

I’d also check to see if you have any alias’s set in the /etc/hosts because it sounds like you are just setting up a test or dev server to play, and it might be you have some entries in there confusing things too. I know that’s bit me in some dev setups before.

I hope some of that helps or makes sense, but let me know if you need some more guidance!

Also you may find the logging docs useful to help you debug and knowing where to look: Logging — Open OnDemand 4.1.0 documentation

Hi Travis,
There are two things I wan to discuss in detail with you.

  1. Many thanks again for your kind response! Adidng the below in the template/script.yml.erb did a trick. This resolved the issue where I can directly launch VNC session without “ssh scompute” node using interative shell terminal. Now I can directly launch the VNC session without the need to ssh the slurm compute node (ssh scompute).

v2:
metadata:
title: “g_acct_hlr cluster”
login:
host: slogin.lan.csc.uni-frankfurt.de
ssh_opts:

  • “-o StrictHostKeyChecking=no”
  • “-o UserKnownHostsFile=/dev/null”

job:
adapter: “slurm”
cluster: “g_acct_hlr”
submit_host: slogin.lan.csc.uni-frankfurt.de # or your cluster login node
ssh_opts:

  • “-o StrictHostKeyChecking=no”
  • “-o UserKnownHostsFile=/dev/null”

sbatch: “/usr/bin/sbatch”
squeue: “/usr/bin/squeue”
scancel: “/usr/bin/scancel”
sinfo: “/usr/bin/sinfo”
bin: “/usr/bin”
conf: “/etc/slurm/slurm.conf”
copy_environment: false
batch_connect:
basic:
script_wrapper: |
module purge
%s
set_host: “host=$(hostname -A | awk ‘{print $1}’)”
vnc:
script_wrapper: |
module purge

Add the TurboVNC installation directory to the PATH

   export PATH=“/opt/TurboVNC/bin:$PATH”
   export WEBSOCKIFY_CMD=“/usr/bin/websockify”
   %s
   set_host: “host=$(hostname -A | awk ‘{print $1}’)”

Hi Travis,
Again today when I run an VNC interatice session again I experience the same problem that without “ssh scompute” its not working. But yesterday night it worked well but again today it works only with “ssh compute”. Though the below line is added in the template/script.sh.erb

Your help or suggestion is indeed;

Thanks in advance!

Please do not open separate threads for a problem you are already receiving help on. We are a very small team operating with very limited resources and simply spamming us won’t get your problem fixed any sooner but will get you banned if you continue.

Also, when you post configs, format them. That has rendered as markdown above which is incredibly frustrating for me on this end to look over. It’s 2026, learn markdown or use an LLM but please look over what you post before you post another thread for the same problem.

Hi Travert,

Sorry! Your message is super hard and this is not the way to reposend it even though. There’s a way to convey the message politely. Since you advised me to post in help section (see what you have wrote above) so only I posted it again in help section cause I thought I did in a wrong place which i wasn’t aware and also this discussion might extend further so its better to post in a right section as per your advise. You can also convey this in a polite manner so that I can also remove it from help section. I am not here to frustrate you (as I totally understand your concern and I do respect it) and I am not that good at markdown things. I jus wan to share it with you that’s it. If you don like then I can remove it. There’s nothing to be banned as here peole are not doing crime. You people should have the patience to answer whatever it is (even if it takes time its ok I am ready to wait). So please try to respond it professionally if possible! If you don wan to help its ok but learn how to respond to the people politely. Here I am are ready to wait to get a help! I respct the way you help. I have removed the same post from Help section. Let me know if I can get help here.

Please don take my message in other way around as I am happy to follow whatever is convinient for you.

Thanks once again!

Hi Travert,

Sorry! Your message is super hard and this is not the way to reposend it even though. There’s a way to convey the message politely. Since you advised me to post in help section (see what you have wrote above) so only I posted it again in help section cause I thought I did in a wrong place which i wasn’t aware and also this discussion might extend further so its better to post in a right section as per your advise. You can also convey this in a polite manner so that I can also remove it from help section. I am not here to frustrate you (as I totally understand your concern and I do respect it) and I am not that good at markdown things. I jus wan to share it with you that’s it. If you don like then I can remove it. There’s nothing to be banned as here peole are not doing crime. You people should have the patience to answer whatever it is (even if it takes time its ok I am ready to wait). So please try to respond it professionally if possible! If you don wan to help its ok but learn how to respond to the people politely. Here I am are ready to wait to get a help! I respct the way you help.It seems the post from Help section got removed. if itz was removed then its fine. Let me know if I can get help here.

Please don take my message in other way around as I am happy to follow whatever is convinient for you.

I am sorry for the inconvinience happend! I wasn’t aware. But will take care henceforth.

Thanks once again!

Hariharan:

As Travis mentioned, we have a very small dev team that does their best to answer question here. But they also need to prioritize actual development and fielding requests from a lot of community members.

We do have a paid support subscription you might want to consider looking into (details available at Subscribe | Open OnDemand ) This would provide you with more prioritized support and might help you get things working more quickly. I’ll also note that by subscribing to the support program you are helping with the long-term sustainability of this completely open source product.

Hi Alanc,

Many thnaks for your kind response! I totally understand and I have no problem to wait for your response. I know that its not easy to manage so many requests being a small team and I reayy resüect the way you are. Also please note that, I am super happy to get your support even if it takes time as well. As per Travert, I am happy to coordinate with him which is not a problem for me. Now I am clear about his difficulty so I am fine to follow up with him as I don have hard feelings. I will also check about paid subscription as well.

Thanks for your kind response!

HI Travis,
Hope you are doing well.
I jus wan to let you know that, I have modified the cluster.yml file and have added slurm login node. I have resolved the “Host key verification failed” error and now users can sublit the job through login host as you siggested base don the work flow.

Now only problem is with this vnc interative session. This woks only when I do “ssh scompute (slurm compute node)” manually otherwise the vnc session is not starting automatically. I read that Open onDemand automatically strats the vnc session on slurm compute node in my case it fails. Let me know if anything I have to modify or to check for this. Your help is indeed;

Many thanks again for your continuous help!

Thanks for the update and patience here, and glad you were able to resolve one of the errors.

So, let’s not do anything with the scompute command going forward, we want to get Open OnDemand to run as it should and to do that we need to use the logs only to guide us, no hacking around. The session log doc entry is here: Logging — Open OnDemand 4.1.0 documentation

Now, when you submit the VNC job and you get a failure stop do not do anything with ssh or anything else. Go into the session logs, and look at the output.log file and see what is happening. We provide a lot of logging in Open OnDemand for this reason and the link above is your friend here. That output.log is what we need to see. We also will want to look and see if the host is right by checking the connection.yml file in that same session data directory.

I also am reading back through all this and seeing you seem to have set_host set in 2 different files? Is that right? It looks like you have it set in the cluster.yml and the script.sh from the first post…If so, remove the cluster.yml entry and just set that in the script.sh for now so we can actually know which file sets the host, then use the logs to see if that host is correct.

And please, format you logs when you post them back in and any configs, it makes it significantly easier on my side to see these things. Because you posted that original config un-formatted I didn’t see you were setting set_host twice until today. Formatting is appreciated and makes our work much more efficient.

Hi Travert,

Hope you are doing well. Here I come with a new problem with Turbovnc session (which is quite important thing to work around now for me). Below is the exact problem description.

Turbovnc installed and running on slurm compute node. When I do a reboot of this slurm compute it lands automatically on MATE Desktop GUI for login session (which may be the actual fucntioning of GUI). So if I wan to launch the slurm job through open ondemand interative apps then I should pre-login with the MATE desktop session physically on a slurm compute node ina server room first where my session is already opened so then if I try to launch an interactive vnc session through open ondemand it works perfectly. This seems to be one-time login physically in a server room for ever unless I do reboot next time. If I don do a pre-login session physically on slurm compute node ina server room with MATE desktop, then this vnc session from open ondemand is not launching remotely. This means that, I have to do pre-login my session physically on a slurm compute node with MATE desktop GUI in a server room. That too this cannot be done remotely, so I have to pre-login physically in the server room and then the vnc session works remotely. Do have any insight about this issue? or do you peopl recommend doing something. cause I am not expert in desktop session so your thoughts on this is indeed.

Many thanks again!

Looking forward to hearing from you;

@harjad84 You can prevent the system from booting directly into the GUI using this command:

systemctl set-default multi-user.target

on RHEL or derivative Linux distros; you may have to alter the command a little bit depending on your OS. You will need to reboot the server after making the change in order for it to take effect, and confirm that it no longer enters the GUI automatically.

Hi Anderss,
Many thanks for your kind suggestion! Yes, it worked well with your suggestion as now it lands on terminal window and no more landing into the GUI.

Many thanks again! The only problem now is with vnc interative session as it not launching automatically where I suppose to run “ssh to scompute (slurm compute node)” before I launch the interactive session. This is super problamatic now where I have to resolve this issue. If there’s any suggestion then please let me know. I would be greatful!

Thanks once again!

@travert I have uploaded the out.log file (I formatted it by removing the repeated error message). I don know if I understood from the way you said above but I have formatted it as much I can to make it easier for you. Sorry! if this not a correct way. Please check this output file when you gett ime and let me know if there’s anything needs to be done. But I don face this error if I do ssh to compute node before I launch the vnc session where the vnc session launches but the web browsers and other tools wasn’t working. If I do a pre-login session physically with MATE desktop physically in server room then everything works like a charm as already my session is active so everything works.

Note: We have this OOD server and slurm severs (slurm login, slurm control & slurm compute) running in a same internal network. Out of this, OOD server alone is open to the outise world.

Thanks in advance!

Looking forward to hearing from you;

output_log.txt (6.0 KB)