Job composer and star-ccm+

jesse.waters · August 17, 2022, 7:22pm

Have a user submitting a starccm job via the “job composer”

From terminal/commnad line, sbatch job script works as expected .
Take same script and try to submit it from the job composer and job errors out.

An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).

This error relates to openmpi that starccm uses by default. Changing starccm to -mpi intel, and submit via job composer it works.

Environment shell appears to be bash all the way through. Any suggestion on where this is getting broken?

travert · August 18, 2022, 4:23pm

Sorry for the issue. Reading what you have here I wonder if the fix is similar from a previous user’s issue:

The solution seemed to be:

Which might be why the environment is not what is expected? What happens when you add that to the line and submit in the job composer?

jesse.waters · August 18, 2022, 5:48pm

Thanks for the response. I tried adding #SBATCH --export=ALL, gives the same error.
It has to be environment, maybe source in /etc/profile and /etc/bash and see

jesse.waters · August 18, 2022, 6:03pm

After adding #SBATCH --export=ALL, job is still submited with SLURM_EXPORT_ENV=NONE

What code does the submital?
/var/www/ood/apps/sys/myjobs

travert · August 18, 2022, 6:31pm

It would be something in the ood_core that I need to check to understand this, and the exact place where that is being set is here:

github.com

OSC/ood_core/blob/d577966f44dee12dbd3e419ec806efedb975014f/lib/ood_core/job/adapters/slurm.rb#L718


      
                    # --export=ALL export the PUN's environment.
                    def export_arg(env, copy_environment)
                      if !env.empty? && !copy_environment
                        env.keys.join(",")
                      elsif !env.empty? && copy_environment
                        "ALL," + env.keys.join(",")
                      elsif env.empty? && copy_environment
                        # only this option changes behaivor dramatically
                        "ALL"
                      else
                        "NONE"
                      end
                    end
                end
              end
            end
          end

So it is odd that the --export=ALL won’t work given what I see there. What version of ood are you on? Would you be able to post the script or the relevant portions?

jesse.waters · August 18, 2022, 7:22pm

was running on 2.0.20, just upgraded 2.0.28 to see if it was addressed (worth a try, but still same issue).

#!/bin/bash

#SBATCH --export=ALL
#SBATCH --get-user-env

# Clerical tracking information
#SBATCH --job-name="Test"
#SBATCH --comment="STAR-CCM+ Solution for JOBNAME"
#SBATCH --account="no-code"


#SBATCH --nodes=2   # number of nodes
#SBATCH --ntasks=8   # total number of processor cores
#SBATCH --partition=ondemand   # Cluster rack number



#source /etc/profile
#source ~/.bash_profile
#source ~/.bashrc

# Load modules
#module purge
#module avail
#module load modules
#module use /opt/Software/corvid/.modulefiles
#module load star-ccm/16.06.008-R8
#module avail

### VARS available from cli not included on JC
export SSH_CLIENT=
export CHROOTDOR=/var/chroots/sl7
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
export LANGUAGE_TERRITORY=en_US
export SSH_TTY=/dev/pts/0
export VNFSROOT=sl7
export XMODIFIERS=@im=none
export LANG=en_US.UTF-8
export TERM=linux
export SELINUX_ROLE_REQUESTED=


# Log output
printf "Job debugging info:\n"
echo "----"
printf "Starting starccm+ simulation run at "
date
echo "----"

export SIM_TITLE="DuctTestRun12Copy.sim"

# Create machine file and submit
MACHINEFILE=$(generate_pbs_nodefile)

env
cd /home/corvid/jwaters/Documents/slurm/Star_RUN

### Prefer openmip
#/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"
/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi "openmpi" -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"

### Works
/opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi intel -machinefile "$MACHINEFILE" -np $SLURM_NTASKS -batch "$SIM_TITLE"

jesse.waters · August 18, 2022, 7:42pm

Possible similar issue from 2 yrs ago

travert · August 18, 2022, 8:31pm

Yeah looking into it looks like this issue has a fix currently in place for what you need:

github.com/OSC/ondemand

allow for job composer to copy environment

OSC:master ← OSC:job-composer-copy-env

opened 06:00PM - 21 Feb 22 UTC

johrstrom

+6 -2

allow for job composer to copy environment so things like `srun` in Slurm work o…f out the box. To test, I just copied the default job and began to edit it * allocate more cores through `#SBATCH --ntasks=5` comment * add a `srun hostname` to the file so it uses srun * run with copy environment checked (this will work) * run without copy environment checked (this will complain about srun) ┆Issue is synchronized with this [Asana task](https://app.asana.com/0/1201735133575781/1201863073369921) by [Unito](https://www.unito.io)

Rebuilding the app off master gives the option in the “Job Options” to copy the environment, but it is not in the 2.0 release branch and as of now there are no current plans to back port this, though I am going to add this to the 2.1 milestone.

travert · August 19, 2022, 2:55pm

I wanted to update that this will be in 2.1 but until then, I had another idea. Are the jobs for the jobs composer app all going to their own cluster, and using there own clusters.d file? If so, what about trying to set the env using bin_overrides in the cluster.d/job_composer_cluster.yml config file:

https://osc.github.io/ood-documentation/latest/installation/cluster-config-schema.html#bin-overrides

And the needed Adapter:

https://osc.github.io/ood-documentation/latest/installation/resource-manager/slurm.html

Let me know if this looks like it might work or if you have any questions around it.

jesse.waters · August 19, 2022, 3:52pm

Able to get the starccm to run.

Work around is to prefix your command w/SLURM_EXPORT_ENV=ALL
Also we have similar issur with srun, use srun --export=ALL

SLURM_EXPORT_ENV=ALL /opt/Software/corvid/star-ccm/16.06.008-R8/STAR-CCM+16.06.008-R*/star/bin/starccm+ -power -mpi openmpi -machinefile “$MACHINEFILE” -np $SLURM_NTASKS -batch “$SIM_TITLE”

system · February 15, 2023, 3:52pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OpenMPI not work in Open OnDemand Get Help	5	479	January 1, 2024
Slurm job fails in OOD but works at CLI Get Help question	3	2176	May 26, 2022
Ondemand with slurm based sytems, sbatch? Get Help	16	3969	May 26, 2022
When checking the copy_environment field of job composer module: command not found Get Help question	5	148	November 25, 2024
SSH Slurm’s job submission and control clients Get Help question	4	91	May 31, 2025

Job composer and star-ccm+

Related topics