Hi all,
I’m using Open OnDemand 4.x and I have a custom installation of Slurm on my system. The Slurm binaries are located in a non-standard path, and they depend on shared libraries (.so) that are also located outside the standard library paths.
Here’s a redacted version of my current cluster YAML configuration:
v2:
metadata:
title: FooBar
url: "https://foo.bar.local/"
login:
host: 192.168.10.200
job:
adapter: "slurm"
bin: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/bin"
conf: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/etc/slurm.conf"
The problem is that Slurm binaries won’t run correctly unless LD_LIBRARY_PATH includes this directory:
/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib
Is there a supported way in OOD v4 to set LD_LIBRARY_PATH in the cluster config YAML, or otherwise inject environment variables into the context where sbatch, squeue, and other Slurm commands are executed?
If YAML doesn’t support this directly, what’s the best practice in OOD v4 for this situation? I’d prefer not to modify system-wide environment settings if possible.
This is the error that appears:
An error occurred when submitting jobs for simulation 1: sbatch: error: plugin_load_from_file: dlopen(/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib/slurm/auth_munge.so): libmunge.so.2: cannot open shared object file: No such file or directory
sbatch: error: Couldn't load specified plugin name for auth/munge: Dlopen of plugin file failed
sbatch: error: cannot create auth context for auth/munge
sbatch: fatal: failed to initialize auth plugin
Best