We’re running into an issue where Slurm is sending out the job start email even before the session actually begins running.
Here’s our setup:
We’re using a global Slurm attribute to trigger emails, and behind the scenes, it’s using the mail.prog attribute that points to a wrapper script. For email handling, we’re using Postfix along with AWS SES.
The issue is that even when a session is still in the queued state (for example, when there’s no available compute capacity), the email still gets sent — even though the job hasn’t started running yet.
From what we understand, this global attribute defaults to --mail-type=BEGIN, but we’re not sure how to modify this behavior so that the email only triggers when the job actually starts running on a compute node, not while it’s still in the queue.
Here’s the relevant snippet from our configuration:
bc_email_on_started:
label: "Email When Session Starts"
widget: check_box
value: false
And it all looks to be hard coded so there is no config change within OOD that could alter this that I’m aware of.
I guess my question is does Slurm actually have an option for what your asking for? Because as far as I know (and Im not a slurm expert by any means) there is not. Looking at the Slurm docs Slurm Workload Manager - sbatch I only am seeing options for what we set, but maybe there’s some clever trick here I’m not understanding.
Thanks for the response,
Yeah, from what I can see, it looks like Slurm doesn’t support this behavior natively — at least not for queued jobs. Ideally, when the controller shows that a job is in a backfill or queued state (because no node is available yet), the --mail-type=BEGIN flag should only trigger once the job actually starts running. But right now, it seems to be sending the email even before that happens.
So, it looks like this might be a limitation on the Slurm side. We may need to come up with a custom workaround or some sort of additional logic to make sure the email only goes out once the job actually begins execution on a compute node.
This is kind of a hack but somewhat related. In one of our apps that uses multiple long running jobs with Slurm dependencies we want to know when the job finishes also, so we update the MailType option within the before.sh.erb script -
<% if context.bc_email_on_started == “1” %>
# Add all emails if form option is checked
scontrol update jobid=$SLURM_JOBID MailType=ALL
<% end %>