OOD portal with Slurm as a resource manager/two clusters

Not to make it more confusing - but here are two separate approaches you could use. In both you’d still have to have two OOD cluster configurations.

In the first you specify bin_overrides in the config and use an ssh wrapper. This shells into the login node of the appropriate server and executes the commands. This way you don’t have to worry so much (actually none at all) about configurations on the web portal’s node. There’s a description of how to do that in this topic. And you can search ssh wrapper on this site because it’s come up before.

Note that users have to be able to ssh from the web node to the login node without being prompted for this to work.

The second approach is what you’re thinking and describing. One binary, two daemons and two configs (that use each daemon respectively). Now I think you should be able to copy the slurm.conf from the cluster to the portal node and only have to modify AuthInfo for one or both of the configs. Since the daemons are booting on different sockets the configs will likely have to reflect that. Booting them manually using cli arguments would be very fragile. Using systemd would be a lot stronger but you’d have to put work into ensuring each systemd target (each cluster daemon) is isolated from the other and always boots with the right configuration. Looks like there’s a CONF_FILE environment variable you can use.

While thinking about this a little bit, the first approach seems a lot easier. The second option is probably viable, but you’d have to do it with automation. I imagine doing it by hand is likely going very hard, fragile and in the end, cause a lot of pain.

Hope that helps!