Hi all,
I’m using Open OnDemand 4.x and I have a custom installation of Slurm on my system. The Slurm binaries are located in a non-standard path, and they depend on shared libraries (.so
) that are also located outside the standard library paths.
Here’s a redacted version of my current cluster YAML configuration:
v2:
metadata:
title: FooBar
url: "https://foo.bar.local/"
login:
host: 192.168.10.200
job:
adapter: "slurm"
bin: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/bin"
conf: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/etc/slurm.conf"
The problem is that Slurm binaries won’t run correctly unless LD_LIBRARY_PATH
includes this directory:
/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib
Is there a supported way in OOD v4 to set LD_LIBRARY_PATH
in the cluster config YAML, or otherwise inject environment variables into the context where sbatch
, squeue
, and other Slurm commands are executed?
If YAML doesn’t support this directly, what’s the best practice in OOD v4 for this situation? I’d prefer not to modify system-wide environment settings if possible.
This is the error that appears:
An error occurred when submitting jobs for simulation 1: sbatch: error: plugin_load_from_file: dlopen(/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib/slurm/auth_munge.so): libmunge.so.2: cannot open shared object file: No such file or directory
sbatch: error: Couldn't load specified plugin name for auth/munge: Dlopen of plugin file failed
sbatch: error: cannot create auth context for auth/munge
sbatch: fatal: failed to initialize auth plugin
Best