Help with deploying Jupyter session to Kubernetes

I am trying to deploy a Jupyter session to a Kubernetes cluster with no success.

I have found the following thread about setting up Jupyter and Kubernetes however when I create the session I am presented with the following issue:

Failed to submit session with the following error:

Error from server (NotFound): the server could not find the requested resource

  • If this job failed to submit because of an invalid job name please ask your administrator to configure OnDemand to set the environment variable OOD_JOB_NAME_ILLEGAL_CHARS.
  • The Jupyter (Kubernetes) session data for this session can be accessed under the staged root directory.

The error log from the apache user shows the following:

App 465700 output: [2023-07-31 22:02:01 +0000 ] INFO “execve = [{}, "kubectl --kubeconfig=/home/shared/testuser/.kube/config --namespace=testuser -o json create -f -"]”
App 465700 output: [2023-07-31 22:02:01 +0000 ] ERROR “ERROR: OodCore::Job::Adapters::Kubernetes::Batch::NotFoundError - Error from server (NotFound): the server could not find the requested resource”

I can confirm that the users namespace has been created within the cluster and it has a .kube/config for the user. We are using OIDC to authenticate with OOD however we don’t use OIDC for the Kubernetes cluster so we have set the cluster config to have the auth as managed:

# /etc/ood/config/clusters.d/my_k8s_cluster.yml
---
v2:
  metadata:
    title: "Kubernetes"
  job:
    adapter: "kubernetes"
    bin: "kubectl"
    all_namespaces: false
    auto_supplemental_groups: false
    server:
      endpoint: "https://<IP FOR CLUSTER>:6443"
      cert_authority_file: "/etc/pki/tls/certs/kube.crt"
    mounts: []
    auth:
      type: "managed"
  batch_connect:
    ssh_allow: false

Now if I run the same command that is present from the apache logs

kubectl --kubeconfig=/home/shared/testuser/.kube/config --namespace=testuser -o json create -f -

I am presented with the same error as the frontend, which is what I’m expecting as there is no file to apply to the Kubernetes cluster.

If I manually run the pod.yml that is generated in the output directory then the pods start deploying.

The Jupyter application that I’m trying to deploy is the following bc_k8s_jupyter with the submit.yml.erb and the form.yml updated for our use case.

What am I missing to get the last parts to join together and submit the yml to kubernetes?

Sorry for the issues.

Looking into our docs more for how to set this all up, have you already worked through:
https://osc.github.io/ood-documentation/latest/installation/resource-manager/kubernetes.html#deploy-hooks-to-bootstrap-users-kubernetes-configuration

It looks like some intricate setup is needed between Kubernetes and OOD for the user’s to pick up the configs and have permissions be correct to read them.

Have you done anything with the PUN pre hooks at this point?

You’ll also have to modify /etc/ood/config/hooks.env because open ondemand provided hooks require a `HOOKENV` environment variable.

I think the user has to be promoted using those prehooks somehow. The docs jump around a bit, but what I’m getting is there has to be some work with the pre-hooks to set the user or network correctly so that the fake-out with k8s works right (it’s not a real scheduler, this is all forcing it to be one and hence the strangeness).

I am just not sure what the ENV vars to do this are and don’t see them in the docs. Did you already check the CIDR env var like the previous post to ensure the network is reachable?

Are you on the web-node when you issue the kubectl command and it worked or are you on the compute node?

No worries about the issues is part of the fun to get it to run.

With regards to your first reply, yes we have setup all the PUN hooks needed for the Kubernetes stuff.

I am able to delete both the namespace and kube/config from the user and upon log in the files and namespace are regenerated.

To your second reply, we are running it on the web node with a successful run but that is only when I specify the file to submit to the Kubernetes cluster.

So our setup is a web node running OOD, an NFS server to share the users home directory and a Kubernetes cluster with 1 control plane and 2 workers.

The web node can successful communicate with the Kubernetes cluster when I run the commands as root since we have set it up that way using the managed auth option. I understand when its in managed mode it doesn’t set the context or cluster as that is managed outside of ODIC and OOD, so basically just runs the commands as root but specifies the OOD users namespace.

Nothing runs as root in Open OnDemand*. when you login, the system has booted up under your UID/GID(s). So when it issues kubectl it does so as the user who’s logged in.

I would suggest you reattempt the test by issuing the same command as the same user and on the same machine

*well OK nginx stage runs as root so that nginx can create a process tree not as 0 but as your UID.

We pass the pod.yml to the kubectl command through standard in. So just using -f pod.yml should suffice as the equivalent.

So if I ran the command as testuser who is our current OOD user i get the following response

testuser@ood:~$ kubectl --kubeconfig=/home/shared/testuser/.kube/config \
--namespace=testuser -o json create -f \
/home/shared/testuser/ondemand/data/sys/dashboard/batch_connect/dev/bc_k8s_jupyter-2/output/b138bd15-49e7-4773-a45f-cd72b48aac78/pod.yml
Error from server (NotFound): the server could not find the requested resource

If I run the command kubectl config view as testuser I can see the context is not set.

Is this where I have gone wrong and it should be set since we have used managed Auth, so its on us to ensure the context is set?

After a bit of a test around the users .kube/config I can confirm its my issue with regards to setting the context for the cluster.

Thank you for pointing me in the right direction

I am able to submit a kubectl command to the cluster when logged in as my testuser. Its now on me to config everything correctly within our environment

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.