Twice in the past two months, we’ve had a user report issues with their Jupyter Notebook session not starting. OOD just shows “Your session is currently starting…” on the Sessions page and the “Connect to Jupyter” button never appears. It stays in that state until the user cancels the job. From looking in the output.log file, I see messages indicating the port Jupyter Notebook was given to run on was already in use:
[I 10:43:47.997 NotebookApp] The port 50530 is already in use.
[C 10:43:47.998 NotebookApp] ERROR: the notebook server could not be started because port 50530 is not available.
On this last occasion, I was able to get on the node in question and confirmed the port was indeed in-use for an outgoing connection in another user’s job. Since the other user’s job was started before the problem job, this makes me suspect the find_port functionality in OOD perhaps isn’t identifying ports used for outgoing connections as being in-use. However, we haven’t ruled out user issues yet as only one user has ever reported this, and the large number of ports OOD searches would make the odds of hitting this seemingly very unlikely. They’re not using any Conda environments, nor do they have much of anything in their .bashrc or .bash_profile though. Given the way our environment is set up, we don’t have a good way of checking every user’s output.log files for similar messages to see how wide-spread this issue might be. Because of that, we’re wondering if anyone else has seen similar behavior before, or if there’s anything anyone can think of that might otherwise trigger this.
I haven’t really seen that much, but IIRC the script should have chosen a port and then tried to access it to determine if it was already open or not. It tries this a few times, generate a number, check to see if it’s open already. If it’s not already open, that’s the port, if it is open, it should try to find another.
Can you share the output.log of this erroneous job? It seems the script didn’t try a different port correctly.
Thanks for responding so quickly! Here’s the log (with the username redacted):
Setting VNC password…
Starting VNC server…
Killing Xvnc process ID 2184673
Xvnc process ID 2184673 already killed
Desktop ‘TurboVNC: armis20002.arc-ts.umich.edu:3 ()’ started on display armis20002.arc-ts.umich.edu:3
Log file is vnc.log
Successfully started VNC server on armis20002.arc-ts.umich.edu:5903…
Script starting…
Waiting for Jupyter server to open port 50530…
Starting main script…
TTT - Mon Oct 27 10:43:41 EDT 2025
Creating launcher wrapper script…
TTT - Mon Oct 27 10:43:41 EDT 2025
Creating custom Jupyter kernels…
TTT - Mon Oct 27 10:43:41 EDT 2025
Creating custom Jupyter kernels from user-created Conda environments…
TTT - Mon Oct 27 10:43:41 EDT 2025
Loading python module
Running mod command, if provided.
A job setup file was not provided.
Currently Loaded Modules:
python3.9-anaconda/2021.11
TTT - Mon Oct 27 10:43:41 EDT 2025
jupyter kernelspec list
Available kernels:
sas /home//.local/share/jupyter/kernels/sas
python3 /sw/pkgs/arc/python3.9-anaconda/2021.11/share/jupyter/kernels/python3
TTT - Mon Oct 27 10:43:45 EDT 2025
grep -q python3.11-anaconda
module list
module list
grep -q mamba
‘[’ exclusive = shared ‘]’
echo -e ‘\n\n ******* Not using ‘'‘srun’'’ due to either no gpus or gpus used in exclusive mode only. ******* \n\n’
******* Not using ‘srun’ due to either no gpus or gpus used in exclusive mode only. *******
jupyter notebook --config=/home//ondemand/data/sys/dashboard/batch_connect/sys/arcts_jupyter_notebook/output/c9b3904f-c922-4cdb-af69-62e2561a048b/config.py
[I 10:43:47.997 NotebookApp] The port 50530 is already in use.
[C 10:43:47.998 NotebookApp] ERROR: the notebook server could not be started because port 50530 is not available.
slurmstepd: error: *** JOB 10776188 ON armis20002 CANCELLED AT 2025-10-27T10:46:04 ***
There doesn’t appear to be an attempt at any other port, but the code doesn’t seem to have many echo statements in the find_port/port_used functions. I’m not certain if the attempt(s) would show up in some different manner.
I did test out the nc check in port_used_nc, and from what I found, it only returns 0 when something is listening on a port and that something either has no connections to it yet, or is set to accept multiple connections. I checked this by using nc -l -p 12345 (add the -k flag to the multi-connection scenario) to start listening on a port, then connected to said port with nc localhost 12345. After that, I was able to test the nc command from port_used_nc to check both the listening and outgoing ports to see what it returned.
Please let me know if you need anything else or have any questions. Thanks!