I am trying to submit a Slurm job through OOD and I am getting the following error:
An error occurred when submitting jobs for simulation 1: sbatch: error: s_p_parse_file: cannot stat file /opt/slurm/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
sbatch: error: ClusterName needs to be specified
sbatch: fatal: Unable to process configuration file
I am using the following as documentation:
Setup SLURM
The OOD server is a separate server from the cluster.
The contents of my .yml file for the cluster on the OOD server is as follows:
---
v2:
metadata:
title: "F&M Research Cluster"
url: "https://dorcfandm.github.io/rcs.github.io/"
login:
host: "rcs-scsn.fandm.edu"
job:
adapter: "slurm"
cluster: "rcs-sc"
bin: "/usr/bin"
conf: "/opt/slurm/slurm.conf"
I also tried to test the setup via the command-line using:
su $USER -c ‘scl enable ondemand – bin/rake test:jobs:cluster1 RAILS_ENV=production’
with the following output:
Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/dashboard/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/dashboard/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
mkdir -p /home/user/test_jobs
Testing cluster 'cluster'...
Submitting job...
[2023-02-07 10:04:42 -0500 ] INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"/usr/bin/sbatch\", \"-D\", \"/home/user/test_jobs\", \"-J\", \"test_jobs_cluster\", \"-o\", \"/home/user/test_jobs/output_cluster_2023_02_07t10_04_42_05_00_log\", \"-t\", \"00:01:00\", \"--export\", \"NONE\", \"--parsable\", \"-M\", \"rcs-sc\"]"
rake aborted!
OodCore::JobAdapterError: sbatch: error: s_p_parse_file: cannot stat file /opt/slurm/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
sbatch: error: ClusterName needs to be specified
sbatch: fatal: Unable to process configuration file
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.29/gems/ood_core-0.22.0/lib/ood_core/job/adapters/slurm.rb:477:in `rescue in submit'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.29/gems/ood_core-0.22.0/lib/ood_core/job/adapters/slurm.rb:415:in `submit'
/var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:30:in `block (4 levels) in <top (required)>'
Caused by:
OodCore::Job::Adapters::Slurm::Batch::Error: sbatch: error: s_p_parse_file: cannot stat file /opt/slurm/slurm.conf: No such file or directory, retrying in 1sec up to 60sec
sbatch: error: ClusterName needs to be specified
sbatch: fatal: Unable to process configuration file
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.29/gems/ood_core-0.22.0/lib/ood_core/job/adapters/slurm.rb:335:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.29/gems/ood_core-0.22.0/lib/ood_core/job/adapters/slurm.rb:244:in `submit_string'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.29/gems/ood_core-0.22.0/lib/ood_core/job/adapters/slurm.rb:475:in `submit'
/var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:30:in `block (4 levels) in <top (required)>'
Tasks: TOP => test:jobs:cluster
(See full trace by running task with --trace)
/opt/slurm/slurm.conf is indeed the correct location for slurm.conf on our cluster and the file permission is set to world readable. There is only one cluster setup and it’s name is rcs-sc
Thank you in advance for any help