Slurm Error when Launching Jupyter

Hi all,

We are running an LSF-based OnDemand instance, and we are interested in adding a new slurm cluster to that same instance.

Basic setup works. We can load the terminal fine, and we can interact with the head node for the slurm cluster.

However, when we try to launch a Jupyter interactive application, we see the following error:

sbatch: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to svlpslurm02:6821: Unable to connect to database
sbatch: error: Sending PersistInit msg: Unable to connect to database
sbatch: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to svlpslurm02:6821: Unable to connect to database
sbatch: error: Sending PersistInit msg: Unable to connect to database
sbatch: error: DBD_GET_CLUSTERS failure: Unable to connect to database
sbatch: error: Problem talking to database
sbatch: error: There is a problem talking to the database: Unable to connect to database.  Only local cluster communication is available, remove --cluster from your command line or contact your admin to resolve the problem

What could be going on here? Any clues on how to get past this?

I do not use SELinux. I checked sestatus and it shows disabled.

Thanks,
Walid

I’d get a shell session on that machine that’s hosting OnDemand and try to issue sbatch commands directly in the CLI. You’re likely to get the same error messages, but it removes OnDemand from the equation so you can limit what you’re looking at when you troubleshoot.

That said - I’m not sure what your issue is. A quick google search turned up this thread
https://lists.schedmd.com/pipermail/slurm-users/2017-November/000179.html

After replicating in the CLI you may have better luck asking in a Slurm forum like their user mailing list.

1 Like