I’ve just installed v1.7.14 on CentOS 8.2.2004, can login using SSSD/LDAP, can get a shell open in the browser and run an interactive job in that shell. Very pleased.
For whatever reason, I can’t get the active jobs page to work?
Strictly, the documentation for Add Cluster Config suggests In production you will also want to add a resource manager.
Because of the way our HPC (PBSPro 19.1.3) is set up, regular users don’t have login access to the Resource Manager, only to the Login Nodes, from which they can submit jobs. But having the job: host: and login: host: (in /etc/ood/config/clusters.d/server.yml) identical isn’t working. Users can successfully run qstat and qselect on the login nodes.
The error I’m seeing in the UI is Server: Connection refused qselect: cannot connect to server server.gen (errno=111)
In /var/log/httpd/error.log I’m seeing a lot of this:
[Tue Jun 30 02:16:48.436015 2020] [lua:warn] [pid 2064:tid 139921695598336] AH01471: Lua error: /opt/ood/mod_ood_proxy/lib/logger.lua:22: bad argument #2 to 'date' (number has no integer representation)
And the error I’m seeing in /var/log/ondemand-nginx/user/error.log looks like this:
App 2958 output: [2020-06-30 02:18:42 -0400 ] ERROR "OodCore::JobAdapterError: Connection refused\nqstat: cannot connect to server server.gen (errno=111)\n\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapters/pbspro.rb:290:in `rescue in info_all'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapters/pbspro.rb:285:in `info_all'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/job/adapter.rb:84:in `info_all_each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `each_slice'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:46:in `block in render'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/clusters.rb:123:in `each'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/ood_core-0.11.4/lib/ood_core/clusters.rb:123:in `each'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:44:in `each_with_index'\n/var/www/ood/apps/sys/activejobs/app/models/jobs_json_request_handler.rb:44:in `render'\n/var/www/ood/apps/sys/activejobs/app/controllers/jobs_controller.rb:18:in `block (2 levels) in index'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/mime_responds.rb:203:in `respond_to'\n/var/www/ood/apps/sys/activejobs/app/controllers/jobs_controller.rb:9:in `index'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/base.rb:194:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/rendering.rb:30:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/callbacks.rb:132:in `run_callbacks'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/callbacks.rb:41:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/rescue.rb:22:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications.rb:168:in `block in instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications/instrumenter.rb:23:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/notifications.rb:168:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/instrumentation.rb:32:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/params_wrapper.rb:256:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/abstract_controller/base.rb:134:in `process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionview-5.2.4.3/lib/action_view/rendering.rb:32:in `process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:255:in `block (2 levels) in process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/dependencies/interlock.rb:42:in `block in running'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/concurrency/share_lock.rb:162:in `sharing'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/activesupport-5.2.4.3/lib/active_support/dependencies/interlock.rb:41:in `running'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:247:in `block in process'\n/opt/ood/ondemand/root/usr/share/gems/2.5/ondemand/1.7.14/gems/actionpack-5.2.4.3/lib/action_controller/metal/live.rb:291:in `block in new_controller_thread'"
App 2958 output: [2020-06-30 02:18:42 -0400 ] INFO "method=GET path=/pun/sys/activejobs/jobs.json format=json controller=JobsController action=index status=200 duration=8.51 view=0.00"
I’m still in testing phase in order to proof of concept for the team, so I don’t yet have a FQDN nor proper SSL set up.
Any tips would be appreciated.
EDIT: fixed typo, it’s clusters.d in /etc/