I don’t have the output.log in the file explorer when I click the session id.
To clarify a bit more, our login node can read/write to the $HOME directories. The compute nodes can only read from the $HOME directories, but their writes happen through an overlayfs mount.
I had to login to the compute node to get the output.log:
cat output.log
/var/spool/slurm/d/job1572404/slurm_script: line 3: module: command not found
Script starting...
Waiting for Jupyter server to open port 7449...
TIMING - Starting main script at: Mon May 18 09:48:34 PDT 2020
TIMING - Starting jupyter at: Mon May 18 09:48:34 PDT 2020
+ jupyter-lab --config=/home/rcwhite/ondemand/data/sys/dashboard/batch_connect/dev/jupyter_test/output/694741ad-a664-4eee-924a-c0db3dd9961b/config.py
[W 09:48:35.689 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 09:48:35.700 LabApp] JupyterLab server extension not enabled, manually loading...
[I 09:48:35.707 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.6/site-packages/jupyterlab
[I 09:48:35.707 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 09:48:35.711 LabApp] Serving notebooks from local directory: /home/rcwhite
[I 09:48:35.711 LabApp] The Jupyter Notebook is running at:
[I 09:48:35.711 LabApp] http://(maz044 or 127.0.0.1):7449/node/maz044/7449/
[I 09:48:35.711 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Discovered Jupyter server listening on port 7449!
Generating connection YAML file...
Would the type of configuration we have cause the apps to stall? Is the home directory used as an IPC of sort?
We have beegfs shares that is writable by all system, just our home directory is the one that has this caveat.
Make sure that compute nodes are syncing their writes properly. connection.yml is generated on the compute node directly. If writes from compute nodes aren’t being synced quickly enough to your file system, then that would explain why the login node is stalled because its waiting for connection.yml to exist.
The home directory is similar to IPC, for example the login node is looking for /jupyter_test/output/694741ad-a664-4eee-924a-c0db3dd9961b/connection.yml
Thank you for the clarification. I had figured it was something along that lines. The group I work with will meet next Friday and discuss setting up read/write on home directory in the cluster as a whole.
Hey @romxero, hope you resolved your overlay issue.
There’s a way to set this directory to be something else in case you’re interested and/or need to. You set this environment variable in the /etc/ood/config/apps/dashboard/env file.