I’m working with a researcher who is trying to submit a Slurm batch job from within the terminal of an interactive desktop. This is essentially a job submitted from within a running job (the interactive desktop itself). The job works when submitted from the command line on the login node, and it also works when submitted from within an interactive bash session created with
salloc. When submitted from an interactive desktop, however, the job as well as the interactive desktop session both crash. The only error output from Slurm is:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
I can’t find much about this online that seems relevant, especially since the only time it fails is from within the OOD interactive desktop. Any guidance would be welcome.
Hello and welcome!
My guess is that because the interactive desktop is also a scheduled job that some environment variables set for it are conflicting with what is set for the submitted job.
What are those values set to in the interactive desktop?
OOD defaults to using
We had to set this environment variable in our desktops so that things like
salloc can work successfully. Seems like you may need to do the same.
Thanks, Jeff. That didn’t seem to change any behavior for me; it still crashes.
Following up on the post from @travert, the only one of those variables with a value within the interactive session is SLURM_MEM_PER_NODE, which is set to the amount specified on the form requesting an interactive desktop. (We allow users to specify between 2 and 128 GB of memory for their desktop session.) I imagine that the job script also attempts to set this variable, which is probably the cause of the issue, somehow. If this is the case, however, I don’t understand why it would work from within an salloc session.
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.