Have a user submitting a starccm job via the “job composer”
From terminal/commnad line, sbatch job script works as expected .
Take same script and try to submit it from the job composer and job errors out.
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
This error relates to openmpi that starccm uses by default. Changing starccm to -mpi intel, and submit via job composer it works.
Environment shell appears to be bash all the way through. Any suggestion on where this is getting broken?
Thanks for the response. I tried adding #SBATCH --export=ALL, gives the same error.
It has to be environment, maybe source in /etc/profile and /etc/bash and see
It would be something in the ood_core that I need to check to understand this, and the exact place where that is being set is here:
So it is odd that the --export=ALL won’t work given what I see there. What version of ood are you on? Would you be able to post the script or the relevant portions?
Yeah looking into it looks like this issue has a fix currently in place for what you need:
Rebuilding the app off master gives the option in the “Job Options” to copy the environment, but it is not in the 2.0 release branch and as of now there are no current plans to back port this, though I am going to add this to the 2.1 milestone.
I wanted to update that this will be in 2.1 but until then, I had another idea. Are the jobs for the jobs composer app all going to their own cluster, and using there own clusters.d file? If so, what about trying to set the env using bin_overrides in the cluster.d/job_composer_cluster.yml config file: