OnDemand and Rancher integration

Hi. I’m still looking at this and am curious, can you post an example jwt token that you receive from your keycloak server so I can compare the fields you have with what I’m producing? @tdockendorf

If it’s a matter of just making something up to make Rancher happy, I’m willing to give it a try. I’m just not sure what needs to be in the ‘spec’ object.

In looking at that go code, it shows that the two parts of the token it’s looking for are the accesskey and secretkey which I don’t even see in the jwt output.

I’m not sure what “spec” object you mean, maybe that’s something specific to Rancher?

This is how I get token to authenticate with Kubernetes using Keycloak token:

export PASSWORD=OMIT
TOKENS=$(curl -d 'grant_type=password' -d 'client_id=kubernetes' -d "client_secret=$(cat /opt/osc/etc/kubernetes-keycloak-secret-dev)" \
-d "username=$USER" -d "password=$PASSWORD" -d 'scope=openid profile offline_access groups' \
https://<KEYCLOAK HOST>/realms/osc/protocol/openid-connect/token); unset PASSWORD

Then I get something like this:

$ echo $TOKENS | jq .
{
  "access_token": "OMIT",
  "expires_in": 3600,
  "refresh_expires_in": 0,
  "refresh_token": "OMIT",
  "token_type": "Bearer",
  "id_token": "OMIT",
  "not-before-policy": 0,
  "session_state": "7be7d34e-c28c-465c-a051-ae6254471a2b",
  "scope": "openid groups email profile offline_access"
}

We configure Keycloak and Kubernetes on the Kubernetes side by passing flags to Kubernetes API server. We use kubeadm setup cluster so this is our cluster config section for api server:

apiServer:
  timeoutForControlPlane: 4m0s
  certSANs:
  - OMIT
  extraArgs:
    oidc-issuer-url: "https://<KEYCLOAK HOST>/realms/osc"
    oidc-client-id: kubernetes
    oidc-username-claim: preferred_username
    oidc-username-prefix: "-"
    oidc-groups-claim: groups
    audit-policy-file: /etc/kubernetes/audit-policy.yaml
    audit-log-path: /var/log/kube-apiserver/audit.log
    audit-log-maxage: '60'
    audit-log-maxbackup: '30'
    audit-log-maxsize: '100'

Forgot to include how we take Keycloak token and feed into kube config:

id_token=$(echo "$TOKENS" | jq -r .id_token)
access_token=$(echo "$TOKENS" | jq -r .access_token)
refresh_token=$(echo "$TOKENS" | jq -r .refresh_token)
mkdir ~/.kube
kubectl config set-cluster kubernetes-dev --server=https://<KUBERNETES API SERVER>:6443 --certificate-authority=/opt/osc/etc/kubernetes-ca-dev.crt
kubectl config set-context ${USER}@kubernetes-dev --cluster=kubernetes-dev --user=${USER}@kubernetes-dev
kubectl config use-context ${USER}@kubernetes-dev 
kubectl config set-credentials ${USER}@kubernetes-dev --auth-provider=oidc \
--auth-provider-arg=idp-issuer-url=https://<KEYCLOAK SERVER>/realms/osc \
--auth-provider-arg=client-id=kubernetes \
--auth-provider-arg=client-secret=$(cat /systems/osc_certs/keycloak/kubernetes-dev.secret) \
--auth-provider-arg=refresh-token="$refresh_token" \
--auth-provider-arg=id-token="$id_token" \
--auth-provider-arg=extra-scopes=openid

Thanks, that was good info, I was able to duplicate what you have here for my environment and confirmed I’m getting the same data back from keycloak…and getting the same error back on the kubernetes cluster I’ve been getting (with the found 1 parts of token). I’m going to try a different kubernetes install and see if I get different results.

getting close now. I decided to blow away the rke1 installation and went with k3s (I’m still looking for something that can be managed ultimately by Rancher as it allows us a good insight into multiple clusters that we run). Setting up K3s manually I was able to get the oidc settings in there and was able to get a test user + a domain user to use kubectl to get info, so that’s a start.

My k3s config (/etc/rancher/k3s/config.yaml) looks like this

kube-apiserver-arg:
  - "oidc-issuer-url=https://keycloak.example.org/realms/OnDemand"
  - "oidc-client-id=test-cluster"
  - "oidc-username-claim=email"
  - "oidc-username-prefix=-"
  - "oidc-groups-claim=groups"
  - "oidc-ca-file=/etc/pki/ca-trust/source/anchors/domain.crt"
  - "audit-policy-file=/etc/rancher/k3s/audit.yaml"
  - "audit-log-path=/var/log/kube-apiserver/audit.log"
  - "audit-log-maxage=60"
  - "audit-log-maxbackup=30"
  - "audit-log-maxsize=100"

I also created a clusterrolebinding for my email address to be a clusteradmin (for testing)

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: oidc-cluster-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: User
  name:user.name@example.org

This is just for information (in case anyone else looks at this)

I was also using ‘kubelogin’ to test with to verify that it would pop up a browser and use my domain credentials properly, I then did verify in the audit log that RBAC was acting on behalf of my email address.

Next step is to go back to my ondemand system and try to get those pieces working since now I’m confident the back-end will work.

In following the directions, since I’m on a more recent than 1.24 version of K8s, I needed to add the serviceaccount token manually:

apiVersion: v1
kind: Secret
metadata:
  name: ondemand-token
  namespace: ondemand
  annotations:
    kubernetes.io/service-account.name: ondemand
type: kubernetes.io/service-account-token

Created Update Kubernetes docs for creating ServiceAccount token resource · Issue #887 · OSC/ood-documentation · GitHub to update our docs with regard to that Secret resource. We started using Kubernetes with OnDemand well before 1.24 so OSC never ran into this.

there’s a couple of typos and stuff in the Kubernetes documentation page that might make things a little clearer:

  • Boot strapping the “Kuberenetes” cluster - typo in the word kubernetes (not a big deal)
  • In the deploy hooks section, you mention needing an /etc/ood/config/hooks.env file (with an ‘s’) and the name of the file is hook.env (without the s), assuming it’d be without?
  • You mention needing PUN pre hooks, but don’t really get into what that should be set to based on your example, or where it should go, etc…
  • another typo further down in OIDC Authentication, the type is listed as ‘odic’ (not a big deal, just pointing it out)
1 Like

I think I might be about 92% of the way there now. I have a generated $HOME/.kube/config file that has a token in it which is good, but notice that there’s no clusters or contexts defined in it, but I’ve got the OIDC stuff in there, so I must have missed something. I don’t see in the set-k8s-creds anything with the cluster info being added to the kube config. Is that done anywhere else?

I’ve confirmed that the namespace gets created and the permissions are set on it. I authenticated externally to the kubernetes api and tried to create a new namespace as my username and got a permission denied, but if I created something like a secret in the namespace that OOD created, that worked, so I think it’s just about there if I can get the cluster/cert info into the $HOME/.kube/config

The ood_core package that handles interfacing with Kubernetes is what deploys the majority of the kubeconfig for each user. The set-k8s-creds only does the OIDC part since it has to access the token from the OnDemand session startup.

The config is pulled from Kubernetes cluster YAML on the OnDemand host, example what OSC uses:

$ cat /etc/ood/config/clusters.d/kubernetes-dev.yml
---
v2:
  metadata:
    title: Kubernetes
    hidden: true
  job:
    adapter: kubernetes
    cluster: ood-dev
    bin: "/usr/local/bin/kubectl"
    username_prefix: dev-
    namespace_prefix: user-
    all_namespaces: false
    auto_supplemental_groups: true
    server:
      endpoint: https://<API SERVER>:6443
      cert_authority_file: "/opt/osc/etc/kubernetes-ca.crt"
    auth:
      type: oidc
  batch_connect:
    ssh_allow: false

Docs: Kubernetes — Open OnDemand 3.0.3 documentation

This is the error that was the cause of the cluster not being populated:

App 5472 output: E, [2023-11-17T11:01:15.686813 #5472] ERROR -- : could not initialize k8s cluster kubernetes because of error 'error: mkdir ~: permission denied
App 5472 output: '
App 5472 output: [2023-11-17 11:01:15 -0500 ]  INFO "method=GET path=/pun/sys/dashboard/ format=html controller=DashboardController action=index status=200 duration=94.78 view=58.26"

I changed the kubernetes cluster config to have config_file: “$HOME/.kube/config” instead of the tilde

Nice, now I’m at 99% done…I just don’t have anything to run yet, but I confirmed that the cluster data is populated, the cert is ok and if I run on the cli:
kubectl --context=ondemand create secret generic stuff -n ood-username, it creates something, so I think this is the end of the road here. I’ll have to work through the sample thing of getting a simple job running and see what happens. I’ll post the entire instruction list once I get it all sorted.

1 Like