502 proxy error when opening file in Jupyter notebook

I just install a Jupyter interactive app and success to run the app and open the Jupyter Notebook in browser.

The problem is, whenever I open a file in Jupyter Notebook, it will open a new tab and start to load, but it aways end up with 502 proxy error. I have no idea why it happens as there is no error in the output.log. It looks to me like the problem of apache server, but I don’t know how to find out the root cause and fix it.

Error Message:

Proxy Error
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request

Reason: Error reading from remote server

Output Log

[I 22:30:14.310 NotebookApp] Serving notebooks from local directory: /data/home/xxxx
[I 22:30:14.311 NotebookApp] Jupyter Notebook 6.4.5 is running at:
[I 22:30:14.311 NotebookApp] http://c51-s001:54978/node/c51-s001/54978/
[I 22:30:14.311 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Discovered Jupyter Notebook server listening on port 54978!
TIMING - Wait ended at: Tue Apr 25 22:30:14 CST 2023
Generating connection YAML file...
[I 22:32:16.428 NotebookApp] 302 POST /node/c51-s001/54978/login (xx) 3.320000ms
[I 22:32:16.525 NotebookApp] 302 GET /node/c51-s001/54978/ (xx) 0.920000ms

Hey, sorry for the trouble.

Could I see the portion of your ood_portal.yml for the reverse proxy, which is documented here:
https://osc.github.io/ood-documentation/latest/reference/files/ood-portal-yml.html#configure-reverse-proxy

I’m mainly looking to see the node_uri and the host_regex.

Sure.

auth:
  - 'AuthType Basic'
  - 'AuthName "Open OnDemand"'
  - 'AuthBasicProvider PAM'
  - 'AuthPAMService ood'
  - 'Require valid-user'
# Capture system user name from authenticated user name
user_map_match: '.*'
# For reverse proxy
host_regex: 'c.*'
node_uri: '/node'
rnode_uri: '/rnode'
# general
servername: localhost
proxy_server: localhost:8080

Note: our computer nodes have the name pattern of c01-s001, c01-s002, ... .

Ok, nothing there looks wrong.

Have you checked the logs for apache when the error occurs? The various logs for OOD can be found here: Logging — Open OnDemand 3.0.0 documentation

I’d be curious to know the entries in the /var/log/httpd/<hostname>_error.log around when the 502 happens.

[Tue Apr 25 23:25:37.648090 2023] [lua:info] [pid 2131] [client ::1:48942] req_protocol="HTTP/1.1" req_handler="proxy-server" req_method="GET" req_accept="application/json, text/javascript, */*; q=0.01" req_user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48" res_content_length="341" req_content_type="" res_content_encoding="" req_status="502" req_origin="" allowed_hosts="localhost,localhost:8086" time_user_map="0.003" local_user="whxu" req_referer="http://localhost:8086/node/c51-s001/54978/tree" res_content_language="" req_port="8086" req_is_websocket="false" req_server_name="localhost" log_hook="ood" req_accept_charset="" req_hostname="localhost" res_content_type="text/html; charset=iso-8859-1" res_content_location="" res_location="" log_time="2023-04-25T15:25:37.647842Z" remote_user="whxu" res_content_disp="" req_user_ip="::1" req_is_https="false" req_filename="proxy:http://c51-s001:54978/node/c51-s001/54978/api/contents?type=directory&_=1682435653093" req_uri="/node/c51-s001/54978/api/contents" time_proxy="60009.973" log_id="ZEfwtcQuX36yoBGq5c-MdAAAAAk" req_accept_language="en" req_cache_control="" req_accept_encoding="gzip, deflate, br", referer: http://localhost:8086/node/c51-s001/54978/tree

It looks like a timeout issue time_proxy="60009.973". Is there a way to increase this value?

And there is also websocket connection failure when Jupyter try to connect to kernel.

[Wed Apr 26 08:58:33.874324 2023] [proxy:error] [pid 16116] (111)Connection refused: AH00957: WS: attempt to connect to 10.1.51.35:80 (*) failed
[Wed Apr 26 08:58:33.874373 2023] [proxy_wstunnel:error] [pid 16116] [client ::1:36078] AH02452: failed to make connection to backend: c51-s001
[Wed Apr 26 08:58:33.875365 2023] [lua:info] [pid 16116] [client ::1:36078] req_protocol="HTTP/1.1" req_handler="proxy-server" req_method="GET" req_accept="" req_user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48" res_content_length="54" req_content_type="" res_content_encoding="" req_status="503" req_origin="http://localhost:8086" allowed_hosts="localhost,localhost:8086" time_user_map="0.004" local_user="whxu" req_referer="" res_content_language="" req_port="8086" req_is_websocket="true" req_server_name="localhost" log_hook="ood" req_accept_charset="" req_hostname="localhost" res_content_type="text/html; charset=iso-8859-1" res_content_location="" res_location="" log_time="2023-04-26T00:58:33.875133Z" remote_user="whxu" res_content_disp="" req_user_ip="::1" req_is_https="false" req_filename="proxy:ws://c51-s001/52418/api/kernels:415697/node/c51-s001/52418/api/kernels/415697fa-6116-4332-8f4d-266e5fc750ea/channels?session_id=763b4acbcec644bc8dbd10d1e16daa62" req_uri="/node/c51-s001/52418/api/kernels/415697fa-6116-4332-8f4d-266e5fc750ea/channels" time_proxy="1.549" log_id="ZEh3OUtlhFnthHWtAcYt4QAAAAE" req_accept_language="en" req_cache_control="no-cache" req_accept_encoding="gzip, deflate, br"

What make me fell interesting is attempt to connect to 10.1.51.35:80, the ip is the host c51-s001, I don’t know why it try to connect to 80 port instead of 52418, which is the listening port of Jupyter notebook.

In my browser I found the request is sent to ws://localhost:8086/node/c51-s001/52418/api/kernels/415697fa-6116-4332-8f4d-266e5fc750ea/channels which seems correct to me. So I think it may be the issue of reverse proxy.

The timeout would be something that needs adjusted on the SLURM side, the proxy has no setting for that.

It is also strange to me that the IP is being used for the connection, I would have expected the compute cluster’s hostname there. That also makes me wonder about some DNS configuration.

Could you post the contents of the connection.yml when you submit?

One thing you could try is to check the status of the Slurm job running on that node to see if it’s still running or if it has timed out. You can do this by running the command squeue -u <username> -n <jobname> (replacing <username> and <jobname> with the appropriate values).

Here is the content of connection.yml

host: c51-s001
port: 28810
password: 3EAOTolIvLZ1he3y

I don’t know what is happening, but sometimes it just works after restart Jupyter kernel. I don’t know where localhost:80 is comming from as there is no such settings.

And now I have another 503 issue when I try to change the file name in Jupyter notebook. Just my gussing, it seems like Apache didn’t allow for PATCH method. Is there a way to fix this via ood configuration file?

curl "http://localhost:8086/node/c51-s001/36287/api/sessions/13a54303-e5d2-44e2-b1da-56b9876b074c" ^
  -X "PATCH" ^
  -H "Accept: application/json, text/javascript, */*; q=0.01" ^
  -H "Accept-Language: en" ^
  -H "Authorization: Basic d2h4dTpoZW5yeTIwMjM=" ^
  -H "Connection: keep-alive" ^
  -H "Content-Type: application/json" ^
  -H "Cookie: username-c51-s001-36287=^\^"2^|1:0^|10:1682562604^|23:username-c51-s001-36287^|44:NzFlN2JlNDVhMzFlNGMyYmJiYjIzNTZkYWUzYmJiODI=^|ad13c7f5d17e4c24d593eb1887ed363953ebbe1d09d8a3689b708728d5fefa20^\^"; _xsrf=2^|01e8c4df^|34b69058864a58f43a6f24978f65cdfa^|1682562604" ^
  -H "Origin: http://localhost:8086" ^
  -H "Referer: http://localhost:8086/node/c51-s001/36287/notebooks/demo.ipynb" ^
  -H "Sec-Fetch-Dest: empty" ^
  -H "Sec-Fetch-Mode: cors" ^
  -H "Sec-Fetch-Site: same-origin" ^
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.58" ^
  -H "X-Requested-With: XMLHttpRequest" ^
  -H "X-XSRFToken: 2^|01e8c4df^|34b69058864a58f43a6f24978f65cdfa^|1682562604" ^
  -H "sec-ch-ua: ^\^"Chromium^\^";v=^\^"112^\^", ^\^"Microsoft Edge^\^";v=^\^"112^\^", ^\^"Not:A-Brand^\^";v=^\^"99^\^"" ^
  -H "sec-ch-ua-mobile: ?0" ^
  -H "sec-ch-ua-platform: ^\^"Windows^\^"" ^
  --data-raw "^{^\^"path^\^":^\^"demo01.ipynb^\^",^\^"type^\^":^\^"notebook^\^",^\^"name^\^":^\^"^\^",^\^"kernel^\^":^{^\^"id^\^":^\^"d396e93f-ac1f-4f99-98a0-a1399db02de1^\^",^\^"name^\^":^\^"python3^\^"^}^}" ^
  --compressed

One of my collegue found a fun fact that if there is a path token in URL start with number, for example, in ws://localhost:8086/node/c51-s001/52418/api/kernels/415697fa-6116-4332-8f4d-266e5fc750ea/channels, as you can see 415697fa-6116-4332-8f4d-266e5fc750ea is start with number, then the proxy will have problem, otherwise it will be OK. I didn’t dive into the code to confirm this issue. But I also observe the same problem in my case. Maybe there are some regular expressions didn’t take this into account.

Turning out this is due to the greedy matching of host_regex: 'c.*' pattern, it is fixed after change it to host_regex: 'c.*?'

I also create a PR to solve this from OOD’s side as it would be hard for users to figure it out from their side: fix node proxy location match pattern by link89 · Pull Request #2784 · OSC/ondemand (github.com)

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.