How to set LD_LIBRARY_PATH for custom Slurm build in cluster YAML (OOD v4)

Hi all,

I’m using Open OnDemand 4.x and I have a custom installation of Slurm on my system. The Slurm binaries are located in a non-standard path, and they depend on shared libraries (.so) that are also located outside the standard library paths.

Here’s a redacted version of my current cluster YAML configuration:

v2:
  metadata:
    title: FooBar
    url: "https://foo.bar.local/"
  login:
    host: 192.168.10.200
  job:
    adapter: "slurm"
    bin: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/bin"
    conf: "/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/etc/slurm.conf"

The problem is that Slurm binaries won’t run correctly unless LD_LIBRARY_PATH includes this directory:

/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib

Is there a supported way in OOD v4 to set LD_LIBRARY_PATH in the cluster config YAML, or otherwise inject environment variables into the context where sbatch, squeue, and other Slurm commands are executed?

If YAML doesn’t support this directly, what’s the best practice in OOD v4 for this situation? I’d prefer not to modify system-wide environment settings if possible.

This is the error that appears:

An error occurred when submitting jobs for simulation 1: sbatch: error: plugin_load_from_file: dlopen(/opt/share/sw/intel/gcc-11.4.1/slurm-24.11.3/lib/slurm/auth_munge.so): libmunge.so.2: cannot open shared object file: No such file or directory
sbatch: error: Couldn't load specified plugin name for auth/munge: Dlopen of plugin file failed
sbatch: error: cannot create auth context for auth/munge
sbatch: fatal: failed to initialize auth plugin

Best

I think this may be what you are looking for:

Particularly the bits about nginx_stage.yml and /etc/ood/profile.

Good luck!
Sean

Or if it were me, I’d just write some more wrappers to supply to the bin directory. I.e., simple shell scripts that can set environment variables and issue the command.

Not sure how adding to LD_LIBRARY_PATH may affect the PUN startup as now nginx will boot with this library path, but is probably OK.

Thank you all for the prompt replies! For now, I’ve resolved the issue by creating some wrapper scripts:

bin_overrides:
      squeue: "/opt/ood/slurm-wrapper/squeue"
      scontrol: "/opt/ood/slurm-wrapper/scontrol"
      sbatch: "/opt/ood/slurm-wrapper/sbatch"
      scancel: "/opt/ood/slurm-wrapper/scancel"

Next, I’ll look into the suggestion provided by @anderss