OpenMPI not work in Open OnDemand

Hi Support

I have submitted a slurm job via the “job composer”, it doesn’t work with the following error,

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

The content of the slurm job script as shown below:-

#!/bin/bash

#SBATCH --partition=jobqueue
#SBATCH --time=1:00:00
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4096M
#SBATCH -J “python-example”

module load Anaconda3/2023.03-1
source activate jn-2303
module load gnu12/12.2.0 openmpi4/4.1.4

mpirun -np 4 python3 pythonmpi.py > result.txt

source deactivate
module unload Anaconda3/2023.03-1 gnu12/12.2.0 openmpi4/4.1.4

However, from terminal/command line on Open OnDemand server, the sbatch job script works as expected.

Then, I found the workaround (Add “export SLURM_EXPORT_ENV=ALL” to the job script) referring to the post “Job composer and star-ccm+”. And, it works eventually.

I would like to know if Open OnDemand can enable SLURM_EXPORT_ENV=ALL by default so that I don’t need to add the statement “export SLURM_EXPORT_ENV=ALL” for each job script every time. The Open OnDemand version is 3.0.1.

Thanks

You could set this globally in the submit.yml.erb instead if you want to have this take effect for each job.

Here’s an example of what we do at OSC:

1 Like

Hi Support,

As I have encountered the problem in “Job composer” instead of interactive app, for “Job composer”, it seems doesn’t has file “submit.yml.erb”. Do you know where to apply the setting globally for “Job composer”? Thanks

Sorry about that. I think for the Job Composer your original approach of doing the export in the script will have to be the way to handle this.

The job composer was updated to accommodate this. There’s now a checkbox for copy environment. This should be set when you’re updating the job options.

Alternatively, you can set these options globally for the entire cluster. Though note that this will affect not only the job composer but all batch connect applications as well.

https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/submit.html#setting-batch-connect-options-globally

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.