Debugging 404 in Jupyter on new cluster

RonRahaman · March 23, 2023, 3:17pm

Hi all,

We’re setting up OnDemand for a new slurm cluster. We have a working OnDemand instance on two of our other slurm clusters. However, we’re having some issues with the new setup, and I’d like some help debugging a 404 response.

This is the status of our OOD instance:

Has working shell access to the login node (through the “Shell Access” dropdown)
The Jupyter app can submit jobs that start the jupyter notebook server on the compute node. I’ve verified that the server is running on the compute node by ssh’ing onto the compute node and running jupyter notebook list

However, when I connect to the notebook server using OOD, I get a 404. Here are the details. I’ve double-checked the config files based on some of the other 404-related help posts on the forum, but no luck so far. Any other config files or logs I should check?

Request/response:

POST
	https://ondemand-dev-ice.pace.gatech.edu/node/compute-ice-dev-slurm-5.pace.gatech.edu/8275/login
Status
404
Not Found
VersionHTTP/1.1
Transferred590 B (196 B size)
Referrer Policystrict-origin-when-cross-origin
Request PriorityHighest

Request header:

POST /node/compute-ice-dev-slurm-5.pace.gatech.edu/8275/login HTTP/1.1
Host: ondemand-dev-ice.pace.gatech.edu
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/111.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://ondemand-dev-ice.pace.gatech.edu/pun/sys/dashboard/batch_connect/sessions
Content-Type: application/x-www-form-urlencoded
Content-Length: 25
Origin: https://ondemand-dev-ice.pace.gatech.edu
Connection: keep-alive
Cookie: MOD_AUTH_CAS_S=23179e65d04f90ca70eb770fdbb6a559
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Sec-GPC: 1

Response header:

HTTP/1.1 404 Not Found
Date: Thu, 23 Mar 2023 14:00:11 GMT
Server: Apache/2.4.34 (Red Hat) OpenSSL/1.0.2k-fips
Content-Security-Policy: frame-ancestors https://ondemand-dev-ice.pace.gatech.edu;
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Content-Length: 196
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1

travert · March 23, 2023, 3:36pm

Is there anything in the logs to correlate with the request? I’d check the logs in:
/var/log/httpd/<hostname>_error.log

I’d also check the logs in:
~/ondemand/data/sys/dashboard/batch_connect/sys/<app>/output/<session id>/output.log

To see what commands are running and I’d be especially curious if you see anything that jumps out in the output.log or the connection.yml that do not look right.

Finding some errors in the log would be a good first step.

travert · March 23, 2023, 3:40pm

Also, in your ood_portal.yml how did you configure the reverse proxy? Seeing those fields would help as well to ensure the generated url isn’t wonky.

RonRahaman · March 23, 2023, 3:52pm

Here’s the output.log and connnection.yml. The output.log shows that the notebook server is running and listening on the reported port. I don’t see any error logs in /var/log/httpd/

output.log.txt (7.1 KB)
connection.yml (84 Bytes)

RonRahaman · March 23, 2023, 3:56pm

And here’s my ood_portal.yml

ood_portal.yml (2.5 KB)

travert · March 23, 2023, 4:23pm

Try adjusting that host_regex as it does not work to capture the pattern for compute-ice-dev-slurm-5.pace.gatech.edu.

I played with it in regex101 and got this to work, but make sure to check it works for you expected cases:
(login|atl1|compute\d*)[\w.-]*\.pace\.gatech\.edu

RonRahaman · March 23, 2023, 6:16pm

Hmm, I double checked both of these in regex101:

(login|atl1|compute)[\w.-]*\.pace\.gatech\.edu
(login|atl1|compute\d*)[\w.-]*\.pace\.gatech\.edu

And they both match compute-ice-dev-slurm-5.pace.gatech.edu and compute-ice-dev-slurm-6.pace.gatech.edu (which are the two nodes on our dev cluster). The \w will match [a-zA-Z0-9_], so I think it makes sense that having (login|atl1|compute)[\w.-]* would match compute-ice-dev-slurm-5.

As a sanity check, I did put your regex in ood_portal.yml, but I got the same errors as before.

RonRahaman · March 23, 2023, 6:32pm

A coworker who was debugging this noticed this in their nginx log:

App 95505 output: [2023-03-23 11:36:46 -0400 ] ERROR "Session specifies nonexistent 'pace-ice' cluster id."

I double checked that the cluster in form.yml.erb ( which is cluster: "ice-slurm") matched the intended filename (which is /etc/ood/config/clusters.d/ice-slurm.yml, so I’m not sure where the pace-ice cluster ID is coming from.

travert · March 23, 2023, 6:38pm

Which regex are you testing?

In the ood_portal you provided this is the regex I saw and tested:
host_regex: "(login|atl1|compute)[\\w.-]+\\.pace\\.gatech\\.edu"

Which didn’t match the hostname i saw for the compute.

I also want to confirm you issued the update_ood_portal command and restarted ood after the host_regex change.

RonRahaman · March 23, 2023, 6:48pm

I’m now using:

host_regex: '(login|atl1|compute\d*)[\w.-]*\.pace\.gatech\.edu'
node_uri: '/node'
rnode_uri: '/rnode'

After making that change, I only ran “Restart Web Server” from the OOD webpage. I hadn’t run update_ood_portal. I can try that and let you know.

travert · March 23, 2023, 7:07pm

Ok, yeah you will need to run that command to load that new config in, which will close connections so be aware if you have active users.

I am curious now though, what are the names of the cluster configs on the file system?

RonRahaman · March 23, 2023, 8:00pm

Running update_ood_portal took care of everything, thanks!

We’re working on a dev instance, prior to rolling them out to prod, so running the command didn’t affect anyone.

travert · March 23, 2023, 8:09pm

Awesome! Glad it worked out, let us know if you have any more questions.

system · September 19, 2023, 8:09pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jupyter 404 error Get Help	7	36	April 11, 2025
Clicking on "Connect to Jupyter" leads to wrong URL Get Help	24	3137	May 26, 2022
Jupyter notebook Issues: The requested URL was not found on this server + Multiple authentication + Invalid Jupyter password Get Help question	22	4181	May 26, 2022
How To Debug Notebook 404 Get Help question	5	1352	May 26, 2022
OOD in a single workstation Get Help question	8	67	November 22, 2024

Debugging 404 in Jupyter on new cluster

Related topics