Job composer and star-ccm+

Have a user submitting a starccm job via the “job composer”

From terminal/commnad line, sbatch job script works as expected .
Take same script and try to submit it from the job composer and job errors out.

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

This error relates to openmpi that starccm uses by default. Changing starccm to -mpi intel, and submit via job composer it works.

Environment shell appears to be bash all the way through. Any suggestion on where this is getting broken?

Sorry for the issue. Reading what you have here I wonder if the fix is similar from a previous user’s issue:

The solution seemed to be:

Which might be why the environment is not what is expected? What happens when you add that to the line and submit in the job composer?

Thanks for the response. I tried adding #SBATCH --export=ALL, gives the same error.
It has to be environment, maybe source in /etc/profile and /etc/bash and see

After adding #SBATCH --export=ALL, job is still submited with SLURM_EXPORT_ENV=NONE

What code does the submital?
/var/www/ood/apps/sys/myjobs

It would be something in the ood_core that I need to check to understand this, and the exact place where that is being set is here:

So it is odd that the --export=ALL won’t work given what I see there. What version of ood are you on? Would you be able to post the script or the relevant portions?

was running on 2.0.20, just upgraded 2.0.28 to see if it was addressed (worth a try, but still same issue).

#!/bin/bash

#SBATCH --export=ALL
#SBATCH --get-user-env

# Clerical tracking information
#SBATCH --job-name="Test"
#SBATCH --comment="STAR-CCM+ Solution for JOBNAME"
#SBATCH --account="no-code"


#SBATCH --nodes=2   # number of nodes
#SBATCH --ntasks=8   # total number of processor cores
#SBATCH --partition=ondemand   # Cluster rack number



#source /etc/profile
#source ~/.bash_profile
#source ~/.bashrc

# Load modules
#module purge
#module avail
#module load modules
#module use /opt/Software/corvid/.modulefiles
#module load star-ccm/16.06.008-R8
#module avail

### VARS available from cli not included on JC
export SSH_CLIENT=
export CHROOTDOR=/var/chroots/sl7
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
export LANGUAGE_TERRITORY=en_US
export SSH_TTY=/dev/pts/0
export VNFSROOT=sl7
export XMODIFIERS=@im=none
export LANG=en_US.UTF-8
export TERM=linux
export SELINUX_ROLE_REQUESTED=


# Log output
printf "Job debugging info:\n"
echo "----"
printf "Starting starccm+ simulation run at "
date
echo "----"

export SIM_TITLE="DuctTestRun12Copy.sim"

# Create machine file and submit
MACHINEFILE=$(generate_pbs_nodefile)

env
cd /home/corvid/jwaters/Documents/slurm/Star_RUN

### Prefer openmip
#/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"
/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi "openmpi" -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"

### Works
/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi intel -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"

Possible similar issue from 2 yrs ago

Yeah looking into it looks like this issue has a fix currently in place for what you need:

Rebuilding the app off master gives the option in the “Job Options” to copy the environment, but it is not in the 2.0 release branch and as of now there are no current plans to back port this, though I am going to add this to the 2.1 milestone.

I wanted to update that this will be in 2.1 but until then, I had another idea. Are the jobs for the jobs composer app all going to their own cluster, and using there own clusters.d file? If so, what about trying to set the env using bin_overrides in the cluster.d/job_composer_cluster.yml config file:

https://osc.github.io/ood-documentation/latest/installation/cluster-config-schema.html#bin-overrides

And the needed Adapter:

https://osc.github.io/ood-documentation/latest/installation/resource-manager/slurm.html

Let me know if this looks like it might work or if you have any questions around it.

Able to get the starccm to run.

Work around is to prefix your command w/SLURM_EXPORT_ENV=ALL
Also we have similar issur with srun, use srun --export=ALL

SLURM_EXPORT_ENV=ALL /opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi openmpi -machinefile “$MACHINEFILE” -np $SLURM_NTASKS -batch “$SIM_TITLE”

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.