Hi all,
I think I am misunderstanding something basic. I thought that if I specified a section like this in my cluster config:
v2:
...
login:
host: "login_node_name.my.org"
…that OOD would ssh to that node to submit cluster jobs and run other cluster-related binaries. But when I try it I get an error that the sbatch binary is not found (it’s there on the login node, but not on my web node (yet)).
So it seems that if I really want to ssh to the head/login node I need to set up bin_overrides.
OK. But that leaves me to wonder what the login
section is for. We run slurm here and as long as you are on a head node you don’t need to know the names of any other nodes, you can submit a job via the command line with sbatch
.
So I am wondering what the login/host
section is for. Maybe other schedulers work differently and require it?
Just a note about how I am setting things up - at the moment I am just putting together some ansible code to set up the web host (and I see there are ansible roles to set up cluster configs, that is awesome). Currently the web host does not have slurm installed but eventually it will. In the meantime I plan to use bin_overrides just for the few of us who are testing this out. I realize that using an ssh wrapper in a bin_overrides script is not really ideal for a production scenario because we can’t guarantee that every user will have passwordless ssh access to the login node.
BTW, what threw me off was step 2 here. It seemed to imply (to me anyway) that OOD would ssh to this node to kick off jobs.
Thanks in advance.