Hello,
I am working with a new OOD 2.0 deployment on RHEL 8.4 and am running into issues with Job Composer being unable to submit jobs. Here is the error message that shows up in the browser:
An error occurred when submitting jobs for simulation 3: pbsconf error: pbs conf variables not found: PBS_HOME
No such file or directory
qsub: cannot connect to server (errno=0)
I’m not sure how to track this error down. I can see the error message in the PUN logs, but it doesn’t add much information (actual full paths omitted):
App 37849 output: [2021-09-29 13:11:27 -0500 ] INFO "execve = [{\"PBS_DEFAULT\"=>\"headnode\", \"PBS_EXEC\"=>\"/path/to/openp
bs/20.0.1\"}, \"/path/to/openpbs/20.0.1/bin/qsub\", \"-j\", \"oe\"]"
App 37849 output: [2021-09-29 13:11:27 -0500 ] ERROR "An error occurred when submitting jobs for simulation 3: pbsconf error: pbs con
f variables not found: PBS_HOME\nNo such file or directory\nqsub: cannot connect to server (errno=0)"
The weird thing is, I can successfully submit this job just fine from the OOD host, with the same user account, via the terminal. Furthermore, when I launch a terminal session on the OOD host, PBS_HOME
, as well as PBS_EXEC
and PBS_CONF_FILE
are set appropriately. I also have the correct paths set in /etc/ood/config/clusters.d/cluster.yml
:
job:
adapter: "pbspro"
host: "headnode"
exec: "/path/to/openpbs/20.0.1"
The last two lines match the contents of PBS_CONF_FILE
which is working on this and other nodes, via the terminal.
Any ideas what could be wrong?