Hi Support
I have submitted a slurm job via the “job composer”, it doesn’t work with the following error,
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
The content of the slurm job script as shown below:-
#!/bin/bash
#SBATCH --partition=jobqueue
#SBATCH --time=1:00:00
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4096M
#SBATCH -J “python-example”
module load Anaconda3/2023.03-1
source activate jn-2303
module load gnu12/12.2.0 openmpi4/4.1.4
mpirun -np 4 python3 pythonmpi.py > result.txt
source deactivate
module unload Anaconda3/2023.03-1 gnu12/12.2.0 openmpi4/4.1.4
However, from terminal/command line on Open OnDemand server, the sbatch job script works as expected.
Then, I found the workaround (Add “export SLURM_EXPORT_ENV=ALL” to the job script) referring to the post “Job composer and star-ccm+”. And, it works eventually.
I would like to know if Open OnDemand can enable SLURM_EXPORT_ENV=ALL by default so that I don’t need to add the statement “export SLURM_EXPORT_ENV=ALL” for each job script every time. The Open OnDemand version is 3.0.1.
Thanks