I’ve just installed v1.7.14
on CentOS 8.2.2004
, can login using SSSD/LDAP, can get a shell open in the browser and run an interactive job in that shell. Very pleased.
For whatever reason, I can’t get the active jobs page to work?
Strictly, the documentation for Add Cluster Config suggests In production you will also want to add a resource manager.
Because of the way our HPC (PBSPro 19.1.3
) is set up, regular users don’t have login access to the Resource Manager, only to the Login Nodes, from which they can submit jobs. But having the job: host:
and login: host:
(in /etc/ood/config/clusters.d/server.yml
) identical isn’t working. Users can successfully run qstat
and qselect
on the login nodes.
The error I’m seeing in the UI is Server: Connection refused qselect: cannot connect to server server.gen (errno=111)
In /var/log/httpd/error.log
I’m seeing a lot of this:
[Tue Jun 30 02:16:48.436015 2020] [lua:warn] [pid 2064:tid 139921695598336] AH01471: Lua error: /opt/ood/mod_ood_proxy/lib/logger.lua:22: bad argument #2 to 'date' (number has no integer representation)
And the error I’m seeing in /var/log/ondemand-nginx/user/error.log
looks like this:
App 2958 output: [2020-06-30 02:18:42 -0400 ] ERROR "OodCore::JobAdapterError: Connection refused\nqstat: cannot connect to server server.gen (errno=111)\n\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapters/pbspro.rb:290:in `rescue in info_all'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapters/pbspro.rb:285:in `info_all'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapter.rb:84:in `info_all_each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `each_slice'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `block in render'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/clusters.rb:123:in `each'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/clusters.rb:123:in `each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:44:in `each_with_index'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:44:in `render'\n/var/www/ood/apps/sys/activejobs/app/controllers/jobs_controller.rb:18:in `block (2 levels) in index'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/mime_responds.rb:203:in `respond_to'\n/var/www/ood/apps/sys/activejobs/app/controllers/jobs_controller.rb:9:in `index'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/base.rb:194:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/rendering.rb:30:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/callbacks.rb:132:in `run_callbacks'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/callbacks.rb:41:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/rescue.rb:22:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications.rb:168:in `block in instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications/instrumenter.rb:23:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications.rb:168:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/instrumentation.rb:32:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/params_wrapper.rb:256:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/base.rb:134:in `process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionview-5.2.4.3/lib/action_view/rendering.rb:32:in `process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:255:in `block (2 levels) in process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/dependencies/interlock.rb:42:in `block in running'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/concurrency/share_lock.rb:162:in `sharing'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/dependencies/interlock.rb:41:in `running'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:247:in `block in process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:291:in `block in new_controller_thread'"
App 2958 output: [2020-06-30 02:18:42 -0400 ] INFO "method=GET path=/pun/sys/activejobs/jobs.json format=json controller=JobsController action=index status=200 duration=8.51 view=0.00"
I’m still in testing phase in order to proof of concept for the team, so I don’t yet have a FQDN nor proper SSL set up.
Any tips would be appreciated.
EDIT: fixed typo, it’s clusters.d in /etc/