I’m wondering if I can set up Open OnDemand to run in a Docker Swarm Cluster. I have previously set up a whole demo Slurm cluster (where the master and compute nodes all ran as Docker containers) with Open OnDemand, running in docker-compose. But this time I want to set up the Open OnDemand app to use our actual Slurm cluster, but I’d like the app to be hosted in our Docker Swarm cluster.
So that’s my general question, and I have some specific sub-questions:
Do I need to fully set up the Open OnDemand machine as a login node? Could I instead set up wrappers to my slurm commands (sbatch, srun, sacct, etc) that ssh to a real login node and run them there?
Our Docker Swarm cluster is using Traefik for service discovery and SSL termination. Can I disable the SSL termination in Open OnDemand so that I can let Traefik handle it?
I have an existing Dockerfile that I developed for my previous demo version, but I’m wondering if there is a Dockerfile out there that I can look at as an example of how to run a production instance of OnDemand in Docker.
The issue here is UID matching. If you have wrapper scripts for sbatch then the user issuing that command (or the ssh command) should be the UID of the actual user. So in the container itself it needs to fork and become that real user. So you’ve got an issue of UID mapping that’s likely going to require SSSD stack mounted into the container.
Probably - but you’d need this hack so that OOD will respond to http requests.
I don’t have this but I know for sure you should install the packages in the container (dep or rpm). Building OOD from the source in the container is discouraged.
@dtenenba We are currently using OOD in docker at Brown. We do have an external facing VM that NFS mounts our HPC file systems including slurm on the container and then we run munge and nis within the container for authentication to slurm. Essentially, we recreated the VM in a container with system services including NIS and munge to ensure that we can correctly authenticate users. Our main system is also running shib for campus-wide login. Our docker repos are private to maintain sensitive info, but will be happy to share our Dockerfile with you.
@dtenenba I also will be happy to hop on a zoom call sometime to walk through our setup if that’s more helpful. There are some specifics related to how the over system is setup with the organization and hopefully I can help with untangling some of them