Jupyter notebook shut down, slurm job remains active

This seems somewhat related, but not identical to Interactive App completed, slurm job remains active with a couple of differences:

  1. It’s 100% reproducible and not a one-off. Any Jupyter job that uses more than 1 core gets into this weird status when completed (*)
  2. I see no errors in querying the queues, simply an almost infinite and very frequent loop
App 1715142 output: [2026-01-27 15:03:28 -0700 ]  INFO "execve = [{\"SLURM_CONF\"=>\"/etc/slurm/slurm.conf\"}, \"/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"10151\"]"

Jupyter correctly shuts itself down and attempt to reconnect rightly fails with an almost empty page containing only Failed to connect to node. The user logs terminates with

[I 2026-01-27 14:26:25.655 YDocExtension] Saving file: Untitled.ipynb
[I 2026-01-27 14:26:31.249 ServerApp] Kernel shutdown: 3cb08703-75e9-4f96-b33b-cdf174f33273
[I 2026-01-27 14:26:31.516 ServerApp] Shutting down on /api/shutdown request.
[I 2026-01-27 14:26:31.517 ServerApp] Shutting down 9 extensions
[I 2026-01-27 14:26:31.517 YDocExtension] Deleting all rooms.

which is almost identical to single-core runs which have two additional lines after those, namely

Cleaning up...

However the lab application never returns, so if I go into the nodes where the problem is occurring, I see the processes started by script.sh (namely jupyter lab --config="${CONFIG_FILE}") still being there. If I kill them then the slurm job terminates. The issue is probably related also to

in which case is clearly not an OnDemand issue, however it affects OnDemand more than other contexts, given that the end user has not easy access to the process to CTRL-C or otherwise kill it.

Has anybody encountered this problem and found a solution or workaround?

What is your ProctrackType on the Slurm side? I feel like proctrack/cgroup works well but the others are buggy.

I’d wonder what the process tree’s for those other kernels are.

Though I do wonder about the cpu allocation being the difference here…

Thanks Jeff.

I indeed have ProcTrackType=proctrack/cgroup in my slurm.conf

Trying a few more times to collect the process trees and CPU allocations I noticed that sometimes it does succeed in completing re. I am now wondering if the problem is a race condition between the autosave and the shutdown, in that if the latter is issued while the autosave is happening (which I may have managed to hit multiple times in a row with different process counts), then the lab remains “hanging” (it never completes, for a test I left it hanging for a day).

Even if that’s the case (I will run more tests and report back) I’m surprised to being the only one having noticed this issue…

Since it most likely depends on version of some stuff, here is my pip freeze of the environment loaded by the OnDemand before starting Jupyterlab

$ pip freeze
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiosignal==1.4.0
anyio==4.12.1
argon2-cffi==25.1.0
argon2-cffi-bindings==25.1.0
arrow==1.4.0
asttokens==3.0.1
async-lru==2.1.0
attrs==25.4.0
babel==2.17.0
beautifulsoup4==4.14.3
bleach==6.3.0
bokeh==3.8.2
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
cloudpickle==3.1.2
comm==0.2.3
contourpy==1.3.3
cycler==0.12.1
dask==2026.1.1
dask_labextension==7.0.0
debugpy==1.8.19
decorator==5.2.1
defusedxml==0.7.1
distributed==2026.1.1
executing==2.2.1
fastjsonschema==2.21.2
fonttools==4.61.1
fqdn==1.5.1
frozenlist==1.8.0
fsspec==2026.1.0
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.11
ipykernel==7.1.0
ipympl==0.9.8
ipython==9.9.0
ipython_pygments_lexers==1.1.1
ipywidgets==8.1.8
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.6
json5==0.13.0
jsonpointer==3.0.0
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
jupyter-collaboration==4.2.0
jupyter-collaboration-ui==2.2.0
jupyter-docprovider==2.2.0
jupyter-events==0.12.0
jupyter-lsp==2.3.0
jupyter-server-ydoc==2.2.0
jupyter-ydoc==3.3.4
jupyter_client==8.8.0
jupyter_core==5.9.1
jupyter_server==2.17.0
jupyter_server_fileid==0.9.3
jupyter_server_proxy==4.4.0
jupyter_server_terminals==0.5.4
jupyterlab==4.5.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.28.0
jupyterlab_widgets==3.0.16
kiwisolver==1.4.9
lark==1.3.1
locket==1.0.0
MarkupSafe==3.0.3
matplotlib==3.10.8
matplotlib-inline==0.2.1
mistune==3.2.0
msgpack==1.1.2
multidict==6.7.0
narwhals==2.15.0
nbclient==0.10.4
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
notebook==7.5.2
notebook_shim==0.2.4
numpy==2.4.1
packaging==25.0
pandas==3.0.0
pandocfilters==1.5.1
parso==0.8.5
partd==1.4.2
pexpect==4.9.0
pillow==12.1.0
platformdirs==4.5.1
prometheus_client==0.24.1
prompt_toolkit==3.0.52
propcache==0.4.1
psutil==7.2.1
ptyprocess==0.7.0
pure_eval==0.2.3
pycparser==3.0
pycrdt==0.12.44
pycrdt-store==0.1.3
pycrdt-websocket==0.16.0
Pygments==2.19.2
pyparsing==3.3.2
python-dateutil==2.9.0.post0
python-json-logger==4.0.0
PyYAML==6.0.3
pyzmq==27.1.0
referencing==0.37.0
requests==2.32.5
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rfc3987-syntax==1.1.0
rpds-py==0.30.0
Send2Trash==2.1.0
setuptools==80.10.1
simpervisor==1.0.0
six==1.17.0
sortedcontainers==2.4.0
soupsieve==2.8.3
sqlite-anyio==0.2.3
stack-data==0.6.3
tblib==3.2.2
terminado==0.18.1
tinycss2==1.4.0
toolz==1.1.0
tornado==6.5.4
traitlets==5.14.3
typing_extensions==4.15.0
tzdata==2025.3
uri-template==1.3.0
urllib3==2.6.3
wcwidth==0.2.14
webcolors==25.10.0
webencodings==0.5.1
websocket-client==1.9.0
widgetsnbextension==4.0.15
xyzservices==2025.11.0
yarl==1.22.0
zict==3.0.0

After extensive testing, I’ve confirmed that if one has any unsaved notebooks open, the server shuts down but the jupyter lab command never returns, leaving an empty slurm job pending until it hits wallclock limit.

I have confirmed that (with these versions of python/jupyter etc) the problem happens irrespective of OnDemand even when running Jupyter alone.

If the user is diligent to save all the notebooks (or waits for autosave to kick in before shutting down the server) the problem does not occur.

And after even more substantial testing, I have tracked this down to be another side effect of Serious Jupyter problem in OnDemand and the same solution to that one (avoid jupyter-collaboration and friends) solves this one too