Does our cluster's design support Interactive Desktop?

Hello!

Been scouring the forums to find someone with a similar problem but can’t seam to find a solution.

I am trying to configure an interactive desktop (XFCE) to our computer nodes. I think I’m 99% there, but when i start a desktop and click on Launch, the new tab opens the VNC with error:
“Failed to connect to Server”

Our Open OnDemand server is running on its own separate VM and only has access to the HPC login (head) node, as well as the shared NFS datastores.
The compute nodes are on their own isolated 10.0.0/24 network, but can connect to services externally through the head nodes thru NAT. The OOD server cant connect directly to the compute nodes since there on a private subnet.

I tried to configure the OOD server with reverse proxy configuration, but that did not solve it either.

Is this even possible with the design we have?

Thanks in advance,
Ivan

Yea I think that’s it. The flow is

you the client -> OOD apache -> the compute node.

We use apache as a point of entry to proxy requests from it to the compute nodes. This is so you can connect to the services running on the compute nodes (like websockify in the case of desktops/VNC applications) instead of connecting directly to the compute node.

Also Hi and welcome!

@jeff.ohrstrom Thanks for the confirmation.

I thought maybe the “reverse” proxy referred to the compute node initiating the connection back to the OOD server, not vice versa

Where do most institutions install OOD?

  • On the head node?
  • On a dedicated physical server in the same subnet as the computes nodes?

I can’t really speak to ‘most’ as I have no visibility into any center, only what I can glean from discourse topics and so on.

But I would say deploying on VMs are quite popular. Head node installations are viable but you just need to take a little care as you have folks sshing into it as well and so there’s a lot of CPU & memory competition between OOD and ssh users.

To chime in. Our OOD instances live on the same net as our login/interface nodes, and our compute nodes live on a private net.

What configuration do you have in the ood config for determining node name, and what does your regex match on OOD look like?

/etc/ood/config/ood_portal.yml “host_regex”

and

/etc/ood/config/clusters.d/name.yml “set_host”

Hey @snowbird294 really appreciate the reply! Sounds like it might be possible then? :grinning_face:

Right now i have it open to any name just to make sure it works, but we have nodes with hostnames starting with ‘node’ and ‘gpu’ so i will be chaning it in the future

host_regex: '[^/]+'
node_uri: '/node'
rnode_uri: '/rnode'

In the ../clusters.d .. file i have (just realized i have it set twice, not sure it matters?)

   batch_connect:
     basic:
       script_wrapper: |
         module purge
         %s
       set_host: "host=$(hostname -A | awk '{print $1}')"
     vnc:
       script_wrapper: |
         module purge
         export PATH="/usr/local/bin/TurboVNC:$PATH"
         export WEBSOCKIFY_CMD="/usr/local/bin/websockify/run"
         %s
       set_host: "host=$(hostname -A | awk '{print $1}')"

Thanks again!

there might be an issue with regex matching, based on the warning on this page:

Put a specific hostname in that regex, and test with that specific host, see if it works with the reverse proxy test

ssh <name>
hostname -A | awk '{print $1}'
nc -l 54321

then in the regex, put the output from the hostname, regen the portal, reload, and navigate to the page:
myood.net/node/<name>

If it doesn’t spew something in the listener terminal, then you don’t have a route at all to the compute, and you may want to bother your network person to get their thoughts.

Ok so i changed it to:

host_regex: '(node|gpu)\d+'

That should work right?

Also our OOD server is running in a VM cluster without direct access to the compute nodes so I think that’s my main issues.

Does your OOD server/instance have two interfaces?
One public and one private that connects to the same switch as your compute hosts?

It’s a single interface.

When you say no direct access, is there a route at all? do the compute nodes resolve with “hostname node-xyz”? can you ping one?

@snowbird294 Yeah unfortunately they’re not reachable by the OOD instance so will need to work with our network/firewall team to make that private net routable.

OOD effectively is a login node so most sites will often have a login node (or multiple), a DTN node and an OOD node that all have interfaces that are either public or exposed to an Institute or campus network and then an interface that is internal to the compute resources.

How do you normally have users login and submit jobs?