Linux Host Adapter feedback

We got the Host Adapter functional in our OOD, and it’s working as it should, which is great. However, to make it better functional in our setup, it’d be good to consider a few enhancements and settings modifications that I’ll describe below.

Before I do that, let me describe our setup. We have 8 “interactive” hosts, called Friscos (frisco1-8) that are independent from all clusters, but which use the same OS image as a cluster interactive node. They are set up so that users can do interactive work without having to run a job. These 8 hosts don’t have the same hardware. We enforce usage limits on these nodes with Arbiter, https://gitlab.chpc.utah.edu/arbiter2/arbiter2, so, we don’t anticipate the need of OOD’s enforcement options.

  1. Round-robin hostname requirement. It would be good to not make it mandatory (as it seems now). Our Friscos don’t round robin. Now if I make submit_host: “frisco1”, the host adapter job only goes to frisco1, not to frisco2-8

  2. Make it possible for user to choose the host to run on. E.g. we could have a pull down menu that would list the ssh_hosts from the host adapter clusters.d/yml file.

Or, perhaps even in the current host adapter implementation, it’d be possible to override the choice of the host to ssh into with an input from the form.yml?

  1. Any thoughts on limiting number of sessions per user on each host, or, would this be too complicated to put to OOD?

  2. The containerized host launch works great, especially with the system directories bind mounts, but, because we run in a container, we don’t see other user processes on the (shared) system. We usually recommend people on Friscos to check the load on the system with tools like “top” to get an idea how busy the system is. They won’t see the whole system load with “top” when it’s run from inside of the container, which may make them think the system is more free than it actually is. I can’t think of a tool that would be able to see the host processes from inside of a container.

This is not a big issue for us since we enforce the usage with Arbiter, but, it will be confusing to users that will try to use Friscos via OOD the same way they were used to with direct SSH.

Now a couple of questions:

  1. What’s the difference between site_timeout and bc_num_hours?

  2. How to customize the descriptions on /var/www/ood/apps/sys/bc_desktop to be specific to the linux host adapter

Thanks,
MC

Thanks for the feedback!

1 and 2. Round robin and submit host choice. I think those are good features we’d want and I can submit the enhancement ticket for that.

  1. Limiting sessions. That is a bit out of scope for OOD. We may be able to do it, but I’d rather think about how to do it on the host itself. Looks like sshd has a MaxSessions parameter that may do just this.
  2. We implemented this because we couldn’t quite kill the whole process tree without running it. This is the only reason we have the container in the stack. Killing the tmux process just wasn’t reliable enough. Though I will try to see if we don’t need the -p option in singularity.

---- Questions

  1. They’re the same thing, only with different precedence. It will choose bc_num_hours if it exists, then fallback to site_timeout. If they both exist, it’ll take the smaller value.
  2. You can add a description field. You can see here a description we use for shared virtual desktops.