Installing on Demand on Preexisting Login Servers?

Hello All,

We are looking to deploy OnDemand on our latest cluster at WVU. My apologies if I already missed this but is it recommended to deploy ondemand on its own dedicated “login host” or can we share login hosts we already have deployed? I know we can share but should we?

Thanks in advance!

Nate

Welcome Nate!

[Edit: this is my personal opinion about VMs. Installing it on an existing login node is OK. It seems to be a matter of organizational preference. Whatever the choice, the host machine (virtual or not) should be treated just like a login node from a security perspective - where any sensitive files/binaries/etc have to be locked down with the appropriate file permissions and ACLs.]

The best practice is to stand up a VM to install on for isolation reasons. Not only user/memory/network/process/maintenance isolation but also to isolate your login node from all the rpm installations.

The added benefit, just from VMs is, less resource usage. OSC’s login nodes are huge with 250 GB of RAM and 28 cores. OOD doesn’t need nearly that much, though someone else will have to chime in with what it does need. (I’m guessing out of thin air here that ~20-30 GB and ~4 cores is enough but again, just a guess, I’ll try to confirm what an appropriate size is).

Oh! and upgrades. A lot of folks have a prod and test instances, where they can test out configs or changes before deploying them to their users. Again, VM isolation is very good for this.

Like everything else in this world it’s a trade-off. You’re trading isolation for Hardware. Smaller sites may need to install directly on the login or head nodes simply because they can’t spare the hardware for a VM. The VM approach gives you isolation in all sorts of dimensions but at the cost of hardware and often underutilized resources.

Sizing seems to be dependent on the number of clients and your organizations willingness to provide buffer resources.

We use a 60GB VM but normal peaks tend to be around the 10GB spot. Obviously we have a lot of headroom, but we did go to the ~25GB mark once or twice. OOD is not super CPU intensive so you could probably get by with 4, though 6 or 8 may give you and your users a lot of comfort. OSC runs 16, but again, we have a lot of headroom and we top out at ~25% utilization.

Another bit about sizing though is the filesystems. It seems /tmp filesystem needs to be fairly large (50 GB) because that’s where uploads are processed.

So that’s sizing for a site that get’s a lot of use. Obviously, more or less users means more or less resource requirements. Hope that helps!

@negregg see Jeff’s edit to his response. I didn’t get an email from Discourse after he edited the response. Summary is that its okay to run OnDemand on the login node but take care of security concerns. See above for details.

Also for context to the VM resource details @jeff.ohrstrom mentions, our OnDemand instance serves over 600 unique users each month and at any given time we usually have 60-100 Per User NGINX (PUNs) processes running. The Passenger apps that make up the core of OnDemand (that NGINX is configured with), are each killed after a short period of inactivity from the user, and when users are using NoVNC or connecting to Jupyter Notebook or RStudio on a compute node, Apache is proxying these users, bypassing the PUN completely. So it can happen that 60 PUNs are running but twice the number of users are actually being served.

Awesome info … thanks all! Great information and looking forward to getting this off the ground. Something that has been on our short list for a way too long!

Nate

Retrieve the idea of the topic title… I would like to know if there is any possibility to spawn the PUN remotely.
What do I mean? I would like to avoid making ood web server onto a login node, which means: mounting users home, working & files directory, munge keys, etc.

Since we already have a hardened login node with everything mentioned above, would be easier to just add an “ood-agent” on login node which would receive the request from ood-webserver apache and spawn the PUN within the existing login node with the current user.

That architecture OOD apache frontend and PUN would not run in the same server.

I’ve seen that LuaScripts are responsible to start nginx as $user. Can LuaScript remotely access the login node as a user and create a nginx session there? Would apache be able to redirect its traffic? After authenticating how would apache connect to login node and spawn PUN there?

Some questions I’m trying to figure out and would be nice to discuss with all :slight_smile:

This is an interesting idea. There are several blockers to consider.

  1. If we could start the PUNs on the login nodes, the PUNs still need to access config in /etc/ood/config, and the deployed app code at /var/www/ood/apps/sys, both of which currently are only deployed to the web hosts. At some point it would appear that the bulk of OnDemand would be installed on the login node and Apache what is starting “remotely” on a web host.

  2. The benefit we gain of having Apache and the PUN on the same host is they can communicate with each other through Unix domain sockets and that secures the communication between Apache and the PUNs because the PUNs are not listening on ports. If there was an ood-agent that started PUNs, we would need to support multual TLS authentication between the PUN and Apache. (more on this at the bottom of the email)

Your goals are:

  1. avoid needing to setup the OnDemand host as a submit host to the batch scheduler
  2. avoid needing to mount the file systems on the OnDemand host

Is there anything else I missed?

If these are it, another approach might be providing an abstraction in the form of two per user services, a job management service, and a file management service, that the OnDemand apps use. The apps would use these services for all their file management and job management.

We do sort’ve support this now with the job management. See Submit to scheduler on seperate node from OndDemand node for a description of this. We don’t have a per-user service that runs but you can use the commands via ssh to another node.

On mutual TLS authentication:

We do have a similar problem with Apache and the interactive apps. We want to add mutual TLS authentication between Apache and the interactive apps running on compute nodes (like RStudio and Jupyter) that would be supported the same way regardless of what app you are running. To do this we are thinking of some kind of service mesh, where a sidecar proxy runs alongside Jupyter or RStudio and Apache proxies to that sidecar. Or perhaps we would have two solutions, one for web servers and one for VNC servers where a side car already exists - websockify - that we could figure out how to add mutual TLS authentication to.

A good solution for this service mesh, one that keeps installation and maintenance of OnDemand as simple as it is today, could possibly be leveraged for mutual TLS between the Apache on the web host and any per user server or service running on the login nodes.

Some projects of interest are https://www.consul.io/ and https://www.cncf.io/ projects. But this investigation is slow going, mostly because many of the service mesh solutions out there are too complex for OnDemand, and those that might be candidates have so many configuration options it would take significant investment to avoid https://owasp.org/www-project-top-ten/OWASP_Top_Ten_2017/Top_10-2017_A6-Security_Misconfiguration

1 Like