Jupyter launch fails for UGE

Running OOD Installed Jupyter App in sandbox. When it launches it times out after a minute. Is it a connection issue?

System and scheduler info:

UGE 8.6.6
CentOS Linux release 7.4
OnDemand version: v1.6.20 | Dashboard version: v1.35.3

Security settings on compute node

firewalld disabled
selinux permissive

/home/thomasbr/ondemand/dev/jupyter/form.yml file:


cluster: “ivy”
modules: “intel python3”
extra_jupyter_args: “”
form:

  • modules
  • extra_jupyter_args
  • bc_account
  • bc_queue
  • bc_num_hours
  • bc_num_slots
  • bc_email_on_started

/home/thomasbr/ondemand/dev/jupyter/submit.yml./erb file: (this is where the files are when the jupyter-bc stuff was installed. Not sure if it needs to be moved to /submit)


batch_connect:
script:
queue_name: “ondemand”
accounting_id: “communitycluster”
job_name: “jupyter_interactive”

native:
- “-pe”
- “sm”
- “4”
- “-S”
- “/bin/bash”

output log file:

Script starting…
Waiting for Jupyter Notebook server to open port 64356…
TIMING - Starting wait at: Wed Apr 1 11:25:19 CDT 2020
TIMING - Starting main script at: Wed Apr 1 11:25:19 CDT 2020
TIMING - Starting jupyter at: Wed Apr 1 11:25:19 CDT 2020
Timed out waiting for Jupyter Notebook server to open port 64356!
TIMING - Wait ended at: Wed Apr 1 11:26:22 CDT 2020
Cleaning up…

scheduler error file:

/export/uge/default/spool/compute-19-10/job_scripts/745458: line 3: module: command not found
/home/thomasbr/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/8656b32b-1419-462b-90e1-db491d1b2225/script.sh: line 14: module: command not found
/home/thomasbr/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/8656b32b-1419-462b-90e1-db491d1b2225/script.sh: line 17: module: command not found
/home/thomasbr/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/8656b32b-1419-462b-90e1-db491d1b2225/script.sh: line 20: module: command not found

  • jupyter notebook --config=/home/thomasbr/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/8656b32b-1419-462b-90e1-db491d1b2225/config.py
    Traceback (most recent call last):
    File “/bin/jupyter-notebook”, line 8, in
    sys.exit(main())
    File “/usr/lib/python2.7/site-packages/jupyter_core/application.py”, line 270, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
    File “/usr/lib/python2.7/site-packages/traitlets/config/application.py”, line 663, in launch_instance
    app.initialize(argv)
    File “”, line 2, in initialize
    File “/usr/lib/python2.7/site-packages/traitlets/config/application.py”, line 87, in catch_config_error
    return method(app, *args, **kwargs)
    File “/usr/lib/python2.7/site-packages/notebook/notebookapp.py”, line 1630, in initialize
    self.init_webapp()
    File “/usr/lib/python2.7/site-packages/notebook/notebookapp.py”, line 1378, in init_webapp
    self.jinja_environment_options,
    File “/usr/lib/python2.7/site-packages/notebook/notebookapp.py”, line 159, in init
    default_url, settings_overrides, jinja_env_options)
    File “/usr/lib/python2.7/site-packages/notebook/notebookapp.py”, line 252, in init_settings
    allow_remote_access=jupyter_app.allow_remote_access,
    File “/usr/lib/python2.7/site-packages/traitlets/traitlets.py”, line 556, in get
    return self.get(obj, cls)
    File “/usr/lib/python2.7/site-packages/traitlets/traitlets.py”, line 535, in get
    value = self._validate(obj, dynamic_default())
    File “/usr/lib/python2.7/site-packages/notebook/notebookapp.py”, line 867, in _default_allow_remote
    for info in socket.getaddrinfo(self.ip, self.port, 0, socket.SOCK_STREAM):
    socket.gaierror: [Errno -2] Name or service not known

It seems to be a networking interface issue. We configure to bind to * ip which is to say, any interface. That python error, I can’t tell what it is for sure, but I’d guess it’s a DNS issue. Like you can’t resolve the loopback or ::1. Are you allowed to open ports on your compute nodes?

When I run a quick test, I get this.

>>> import socket;
>>> socket.getaddrinfo("*", 8081, 0, socket.SOCK_STREAM)
[(10, 1, 6, '', ('::1', 8081, 0, 0)), (2, 1, 6, '', ('127.0.0.1', 8081))]

You could try to set c.NotebookApp.ip = '0.0.0.0' to be more explicit, see if that changes anything.

We are allowed to open ports on the compute node. On the compute node I get the same thing.

import socket
socket.getaddrinfo("*", 8081, 0, socket.SOCK_STREAM)
[(10, 1, 6, ‘’, (’::1’, 8081, 0, 0)), (2, 1, 6, ‘’, (‘127.0.0.1’, 8081))]

I think it’s a resolution issue more than permission. If you google the phrase “Name or service not known” (the error string you received) it’s all about different programs not being able to DNS resolve a hostname.

Did you try setting c.NotebookApp.ip = '0.0.0.0' in your before.sh.erb? I see several answers on stack overflow or their Github that say that’s the resolution.

It works! You can close the ticket.

Thanks