OnDemand multiple connector

Hi,
We have an on-premise onDemand cluster that uses SLURM connector , we are planning to expand to use Kubernetes in AWS (based on some requirements GPU requests for example), the question is, can onDemand have multiple connector, i.e can we add Kubernetes connector to the SLURM connector? If it’s possible, do you know if anyone has done this before who can help? Any help, resources is appreciated.

Faras

Hi and welcome!

Sure, all the files in cluster.d are completely separate entities. We run Slurm & kubernetes and a linux host adapter at OSC. (and in fact had a time when we ran Torque & Slurm while we transitioned from one to the other).

Although - we don’t have official support for AKS. I’d setup GKE a little when I originally wrote the adapter, but haven’t tested it much since. Let us know if the current authentication modes for kubernetes work for you or if you need some special initialization like GKE does.

Hi Jeff

Thanks for the quick reply, this is very helpful. I will setup AWS EKS connector and let you know. Would you mind to share the special initialization that you needed to do for GKE?

Faras

Hi Jeff,

The document here:
Configure Kuberenetes — Open OnDemand 2.0.13 documentation
talks about service account in GCP:


v2:
job:
# …
auth:
type: ‘gke’
svc_acct_file: ‘~/.gke/my-service-account-file’

Did you write this auth adapter ‘gke’ ? and does every user have to have a service account in GCP in order to use the GKe ?

We at Harvard are working to connect Open OnDemand to use AWS EKS, It will be helpful if you can share your GKE adapter and configuration with us.
Many thanks

Yes, probably to both. I developed the kubernetes adapter initially against GKE because we had credits at the time. We’ve since deployed an on prem cluster so, a lot of the knowledge may be lost.

OOD expects to deploy pods into a users namespace only. We’re still trying to maintain that user level privilege by restricting all the users to their own namespace. Hence all the bootstrapping policies and rolebindings and so on.

If they all share service account file that kind of defeats the purpose right? I mean, you could use a global one, but clearly there’s a security risk there.

Here’s a playbook I was using at the time to spin up and turn down clusters. Pod Security Policies have since been deprecated - so I’m not really sure what sort of replacement AWS may have.

Of course all that said, we don’t support AWS yet. Though pull requests are welcome!

Here’s the relevant portion of what we do for GKE to set itself up. We primarily just offload it to gcloud.

/job/adapters/kubernetes/batch.rb#L344-L366