Basic confusion about cluster config

Hi all,

I think I am misunderstanding something basic. I thought that if I specified a section like this in my cluster config:

v2:
  ...
  login:
    host: "login_node_name.my.org"

…that OOD would ssh to that node to submit cluster jobs and run other cluster-related binaries. But when I try it I get an error that the sbatch binary is not found (it’s there on the login node, but not on my web node (yet)).

So it seems that if I really want to ssh to the head/login node I need to set up bin_overrides.

OK. But that leaves me to wonder what the login section is for. We run slurm here and as long as you are on a head node you don’t need to know the names of any other nodes, you can submit a job via the command line with sbatch.

So I am wondering what the login/host section is for. Maybe other schedulers work differently and require it?

Just a note about how I am setting things up - at the moment I am just putting together some ansible code to set up the web host (and I see there are ansible roles to set up cluster configs, that is awesome). Currently the web host does not have slurm installed but eventually it will. In the meantime I plan to use bin_overrides just for the few of us who are testing this out. I realize that using an ssh wrapper in a bin_overrides script is not really ideal for a production scenario because we can’t guarantee that every user will have passwordless ssh access to the login node.

BTW, what threw me off was step 2 here. It seemed to imply (to me anyway) that OOD would ssh to this node to kick off jobs.

Thanks in advance.

It is for the shell app. The host listed here will be the host you shell into in the Clusters menu.

Instead of bin_overrides, I believe you can supply a submit_host instead. That may be easier.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.