We’re setting up OnDemand for a new slurm cluster. We have a working OnDemand instance on two of our other slurm clusters. However, we’re having some issues with the new setup, and I’d like some help debugging a 404 response.
This is the status of our OOD instance:
Has working shell access to the login node (through the “Shell Access” dropdown)
The Jupyter app can submit jobs that start the jupyter notebook server on the compute node. I’ve verified that the server is running on the compute node by ssh’ing onto the compute node and running jupyter notebook list
However, when I connect to the notebook server using OOD, I get a 404. Here are the details. I’ve double-checked the config files based on some of the other 404-related help posts on the forum, but no luck so far. Any other config files or logs I should check?
Transferred590 B (196 B size)
And they both match compute-ice-dev-slurm-5.pace.gatech.edu and compute-ice-dev-slurm-6.pace.gatech.edu (which are the two nodes on our dev cluster). The \w will match [a-zA-Z0-9_], so I think it makes sense that having (login|atl1|compute)[\w.-]* would match compute-ice-dev-slurm-5.
As a sanity check, I did put your regex in ood_portal.yml, but I got the same errors as before.
I double checked that the cluster in form.yml.erb ( which is cluster: "ice-slurm") matched the intended filename (which is /etc/ood/config/clusters.d/ice-slurm.yml, so I’m not sure where the pace-ice cluster ID is coming from.