That did indeed work and I did not need strict_host_checking: false.
For anyone else with this problem, trying to submit jobs through Slurm to a cluster when OOD is hosted on a different server from the cluster here is what I believe to be a full fix.
-
Copy the cluster’s slurm.conf file to the server hosting OOD
Note: In our slurm.conf file we had one line that referenced <servername>.cluster
We modified the slurm.conf on the OOD server to remove the .cluster part
because it caused some errors as that domain name was local to our
clusterIn the cluster .yml file, the conf: value needs to refer to the location of slurm.conf
on the OOD server -
In the cluster .yml file make sure under the job section you define submit_host:
According to one of the comments in this thread from OOD folks
if you don’t define that, then you need Slurm running on the OOD server -
On the OOD server, check /etc/ssh to see if a known_hosts file exists.
a) If not create one
b) Add the cluster’s public key to the known_hosts file on the OOD server.
When I ran some of the rake tests I noticed some comments about
ECDSA keys so we manually copied/pasted our cluster’s public ECSDA
key into the OOD server known_hosts. The cluster’s key file was also in the
/etc/ssh directory.The format we used in known_hosts was <serverIP>,<serverhostname><space><publickey>