I am trying to deploy a Jupyter session to a Kubernetes cluster with no success.
I have found the following thread about setting up Jupyter and Kubernetes however when I create the session I am presented with the following issue:
Failed to submit session with the following error:
Error from server (NotFound): the server could not find the requested resource
If this job failed to submit because of an invalid job name please ask your administrator to configure OnDemand to set the environment variable OOD_JOB_NAME_ILLEGAL_CHARS.
The Jupyter (Kubernetes) session data for this session can be accessed under the staged root directory.
The error log from the apache user shows the following:
App 465700 output: [2023-07-31 22:02:01 +0000 ] INFO “execve = [{}, "kubectl --kubeconfig=/home/shared/testuser/.kube/config --namespace=testuser -o json create -f -"]”
App 465700 output: [2023-07-31 22:02:01 +0000 ] ERROR “ERROR: OodCore::Job::Adapters::Kubernetes::Batch::NotFoundError - Error from server (NotFound): the server could not find the requested resource”
I can confirm that the users namespace has been created within the cluster and it has a .kube/config for the user. We are using OIDC to authenticate with OOD however we don’t use OIDC for the Kubernetes cluster so we have set the cluster config to have the auth as managed:
It looks like some intricate setup is needed between Kubernetes and OOD for the user’s to pick up the configs and have permissions be correct to read them.
Have you done anything with the PUN pre hooks at this point?
You’ll also have to modify /etc/ood/config/hooks.env because open ondemand provided hooks require a `HOOKENV` environment variable.
I think the user has to be promoted using those prehooks somehow. The docs jump around a bit, but what I’m getting is there has to be some work with the pre-hooks to set the user or network correctly so that the fake-out with k8s works right (it’s not a real scheduler, this is all forcing it to be one and hence the strangeness).
I am just not sure what the ENV vars to do this are and don’t see them in the docs. Did you already check the CIDR env var like the previous post to ensure the network is reachable?
No worries about the issues is part of the fun to get it to run.
With regards to your first reply, yes we have setup all the PUN hooks needed for the Kubernetes stuff.
I am able to delete both the namespace and kube/config from the user and upon log in the files and namespace are regenerated.
To your second reply, we are running it on the web node with a successful run but that is only when I specify the file to submit to the Kubernetes cluster.
So our setup is a web node running OOD, an NFS server to share the users home directory and a Kubernetes cluster with 1 control plane and 2 workers.
The web node can successful communicate with the Kubernetes cluster when I run the commands as root since we have set it up that way using the managed auth option. I understand when its in managed mode it doesn’t set the context or cluster as that is managed outside of ODIC and OOD, so basically just runs the commands as root but specifies the OOD users namespace.
Nothing runs as root in Open OnDemand*. when you login, the system has booted up under your UID/GID(s). So when it issues kubectl it does so as the user who’s logged in.
I would suggest you reattempt the test by issuing the same command as the same user and on the same machine
*well OK nginx stage runs as root so that nginx can create a process tree not as 0 but as your UID.
We pass the pod.yml to the kubectl command through standard in. So just using -f pod.yml should suffice as the equivalent.
So if I ran the command as testuser who is our current OOD user i get the following response
testuser@ood:~$ kubectl --kubeconfig=/home/shared/testuser/.kube/config \
--namespace=testuser -o json create -f \
/home/shared/testuser/ondemand/data/sys/dashboard/batch_connect/dev/bc_k8s_jupyter-2/output/b138bd15-49e7-4773-a45f-cd72b48aac78/pod.yml
Error from server (NotFound): the server could not find the requested resource
If I run the command kubectl config view as testuser I can see the context is not set.
Is this where I have gone wrong and it should be set since we have used managed Auth, so its on us to ensure the context is set?
After a bit of a test around the users .kube/config I can confirm its my issue with regards to setting the context for the cluster.
Thank you for pointing me in the right direction
I am able to submit a kubectl command to the cluster when logged in as my testuser. Its now on me to config everything correctly within our environment