There has been some discussion earlier on different sites implementing e-mail alerts when OOD jobs start, via the bc_email_on_started attribute. For the user to get the e-mail, the scheduler needs to also have the e-mail address of the user and turn on the option to feed in that e-mail (in SLURM –mail-user).
I can put this in each app’s submit.yml.erb (e.g., in our case –mail-user=$USER@utah.edu), but, given the amount of apps we have it’d be easier if this could be a global option for all jobs, or at least the interactive apps. Is there a possibility to add a global scheduler directive that’d get used by all the apps?
Semi-answering my own question. Found Global email setting, though it’s 2.5 years old.
Eric is suggesting to look at the global entry in the cluster definition file but I am confirming his expectation that the $USER is not rendered by SLURM in the script. Has there been any progress with this since then? I.e. is it e.g. possible now to erb render the cluster definition files?
Our Slurm just knows what email address to send to - I’d say this is the problem to solve. There’s another topic where I may have said something similar. @tdockendorf please advise.
A quick google search turned out the
MailDomain in the slurm conf. Though our slurm may be able to route to any domain given we have a lot of osu and osc users.
The way our mail works is you can send an email directly to “tdockendorf” with something like this:
echo “TEST” | mail -s TEST tdockendorf
That will send an email to the local postfix daemon which forwards all postfix mail to our central mail server and that central mail server does LDAP mapping of usernames to email stored in LDAP. I believe all SLURM is doing is sending email to like “tdockendorf@” and that gets forwarded to our central mail server and then routed based on the LDAP data. Our main mail server is one of the few mail systems at our site allowed to send mail directly to mail servers on the internet without ending up in spam folders. Most of that is handled by DNS TXT records and SPF records we define for our domains.
Let me know if our postfix configs would be useful and I can send relevant config entries.
Thanks Trey, your test works at our place too - I figure our admins have mail set up like you do.
Now, what’s not clear to me if one at OSC needs to specify the sbatch’s --mail-user option in order to receive the mails? That seems to be the case here at CHPC. Without --mail-user we don’t get the e-mail.
bc_email_on_started works out of the box for us without additionally specifying the
--mail-user because as Trey mentioned our mail servers get the email address from LDAP.
Did you run that test on a compute node or a login node? Or maybe it needs to be enabled on the Slurm Controller node? Maybe that’s where you need to run a test from. I’d guess that’s the origin server of any emails.
Based on man page for sbatch, you need
--mail-type to enable mail and the value for
--mail-user defaults to system username running the job.
Deleting my previous remark - the reason things worked was an ancient .forward in my home dir.
As mentioned in the other thread, it’d be nice to have a central location of the email field e.g. in the ERB processed cluster def file.
What we’ll do now is to have the
email: <%= ENV[“USER”] %>@utah.edu
in each app’s submit.yml.erb.
I’ll be also looking at replacing the e-mail address with a script to look up e-mail address from our user database as we have some users from other institutions who don’t use their utah.edu address regularly.
We are in the process of streamlining the email forwarding between our servers and campus but are not quite there yet.