Jupyter session enters bad state

Trying to debug why session runs into Bad state.

From the output.log

TIMING - Starting main script at: Mon Jun  9 21:01:01 UTC 2025
TIMING - Starting jupyter at: Mon Jun  9 21:01:01 UTC 2025
/shared/home/abhi400/ondemand/data/sys/dashboard/batch_connect/sys/bc_example_jupyter/output/8fbc4a06-66dd-47de-a7a4-f0c187cecf83/script.sh: line 30: /shared/home/abhi400: Is a directory
[I 21:01:04.723 NotebookApp] Authentication of /metrics is OFF, since other authentication is disabled.
[W 21:01:05.665 NotebookApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[W 21:01:06.348 NotebookApp] Loading JupyterLab as a classic notebook (v6) extension.
[W 2025-06-09 21:01:06.351 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2025-06-09 21:01:06.351 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2025-06-09 21:01:06.351 LabApp] 'token' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2025-06-09 21:01:06.351 LabApp] 'token' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2025-06-09 21:01:06.351 LabApp] 'token' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[I 2025-06-09 21:01:06.365 LabApp] JupyterLab extension loaded from /shared/home/abhi400/jupyter-env/lib64/python3.7/site-packages/jupyterlab
[I 2025-06-09 21:01:06.365 LabApp] JupyterLab application directory is /shared/home/abhi400/jupyter-env/share/jupyter/lab
[I 21:01:06.377 NotebookApp] Serving notebooks from local directory: /shared/home/abhi400/ondemand/data/sys/dashboard/batch_connect/sys/bc_example_jupyter/output/8fbc4a06-66dd-47de-a7a4-f0c187cecf83
[I 21:01:06.377 NotebookApp] Jupyter Notebook 6.5.7 is running at:
[I 21:01:06.377 NotebookApp] http://general-dy-general-cr-1:6400/
[I 21:01:06.377 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 21:01:34.525 NotebookApp] 302 GET / (10.76.97.73) 0.510000ms
[C 21:02:03.464 NotebookApp] received signal 15, stopping
[I 21:02:03.470 NotebookApp] Shutting down 0 kernels
[I 21:02:03.470 NotebookApp] Shutting down 0 terminals

  _   _          _      _
 | | | |_ __  __| |__ _| |_ ___
 | |_| | '_ \/ _` / _` |  _/ -_)
  \___/| .__/\__,_\__,_|\__\___|
       |_|
                       
Read the migration plan to Notebook 7 to learn about the new features and the actions to take if you are using extensions.

https://jupyter-notebook.readthedocs.io/en/latest/migrate_to_notebook7.html

Please note that updating to Notebook 7 might break some of your extensions.

The script.sh.erb

#!/usr/bin/env bash

# Redirect all output (stdout and stderr) to output.log
exec > output.log 2>&1

# Benchmark info
echo "TIMING - Starting main script at: $(date)"

# Set working directory to home directory
cd "${HOME}"

# Activate virtual environment
source /shared/home/abhi400/jupyter-env/bin/activate

#
# Start Jupyter Notebook Server
#

<%- unless context.modules.blank? -%>
# Purge the module environment to avoid conflicts
module purge

# Load the require modules
module load <%= context.modules %>

# List loaded modules
module list
<%- end -%>

# Benchmark info
# Benchmark info
echo "TIMING - Starting jupyter at: $(date)"

# Optional sleep to prevent OOD from missing log output
sleep 2

# Launch the Jupyter Notebook Server
#set -x
#jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
~
# Launch Jupyter explicitly on port 6400
jupyter notebook --no-browser --ip=0.0.0.0 --port=6400 --NotebookApp.token='' &

# Capture PID and wait later
JUPYTER_PID=$!

# Give it a few seconds to start
sleep 5

# Write connect.yml (for OOD to know it's alive)
cat <<EOF > "${PWD}/connect.yml"
---
host: $(hostname -i)
port: 6400
EOF

# Wait on the actual jupyter process
wait $JUPYTER_PID

EXIT_CODE=$?
echo "Jupyter exited with code $EXIT_CODE"
exit $EXIT_CODE

Hello and welcome!

One thing I notice is on line 40 of the script.sh.erb, there’s a ~ there which seems like it could lead to this error:

/shared/home/abhi400/ondemand/data/sys/dashboard/batch_connect/sys/bc_example_jupyter/output/8fbc4a06-66dd-47de-a7a4-f0c187cecf83/script.sh: line 30: /shared/home/abhi400: Is a directory

Though it looks like things keep going after that, so I don’t think that’s the root cause of the failure, but you will still want to remove that.

Have you enabled the OOD proxy for the redirect already as documented here:

If so have you also configured it correctly as documented here:

Thank you so much for pointing us in the right direction.

We added the following configuration to enable the reverse proxy:

# /etc/ood/config/ood_portal.yml
# Enable the reverse proxy feature
enable_reverse_proxy: true
host_regex: '[^/]+'
node_uri: '/node'
rnode_uri: '/rnode'

After that, we built the updated Apache configuration file and restarted httpd . However, the reverse proxy still doesn’t seem to be working.

We tried running a curl request from the OOD instance to: https://<ood-hostname>/node/<compute-node>:<port>/ but it returned a 404 Not Found error.

That said, we are able to curl directly to <compute-node>:<port> from the OOD instance and get a valid response, so it doesn’t appear to be a networking issue.

Any inputs to what we may be missing.

P.S I did take care of the ~, thx!

It looks like things are running but the connection isn’t getting the right host. I notice you are just using the regex for the host_regex straight from the config, but you should note we specifically call out in the docs not to do that as you are using a very open ended regex there. But it also is very likely the host name is just not being extracted correctly with that regex.

You need to use something like regex101 and test on that regex with the hostnames you are using to start. From the curl it looks like things are running and it’s just that connection hostname from the regex that is off.