How to add a resource manager?

Hello all,

I’m trying to install OOD on a Japanese supercomputer Fugaku.
Fugaku uses a special resource manager developed by Fujitsu Limited.
Its syntax is similar to existing resource managers such as slurm.

Could you please tell me how to add a new resource manager ?
I think there are many people like me, so it would be even better if there was a manual.

Perhaps the resource manager adapters are stored in
the directory /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/job/adapters/.

I think I create an new adapter file in the directory, but that does not seem to be enough.

Thanks,

Very cool! I’ve heard a lot about that system.

Can you elaborate on this? I believe the only requirement would be that the name you’re trying to use matches the name of the actual .rb file and you need to use the factory pattern because that’s how it’s being instantiated.

Dynamically, based off of the name, and through a factory.

Here’s a recent example of an additional adapter. Nothing references that, that is, requires these additional files.

Also, as we’re finding out with Microsoft in a Azure specific adapter - being out of bounds with the main project may get tough for you. That is, you’ll always have to install and add all sorts of patches.

We’re happy to pull a new adapter in, even if it’s only for a subset of users. Having this in the upstream project could make your life a lot easier - and that’s half of my job.

So in sum, if you get this going and want to use it we’re happy to include it in the distribution to make it easier for you.

Thank you for your help and quick reply.

And, excuse me. The issue I’m having isn’t due to adding a new resource manager adapter, it seems to be a Cluster Configuration File problem.

First, I will explain the Fugaku system. Fugaku consists of a compute nodes and pre/post nodes. The compute nodes use a special resource manager, while the pre/post nodes use a Slurm. I have confirmed that OOD works for pre/post nodes.

When I copy the Cluster Configuration File for pre/post nodes to it for compute nodes ( cd /etc/ood/config/clusters.d; cp pre-post.yml fugaku.yml) and execute the appropriate Interactive Apps, the following error will occur. Of cause, I don’t edit fugaku.yml.

#<ArgumentError: missing keywords: :id, :status>

/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/job/info.rb:89:in `initialize'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/job/adapters/slurm.rb:670:in `new'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/job/adapters/slurm.rb:670:in `handle_job_array'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/job/adapters/slurm.rb:481:in `info'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:349:in `update_info'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:340:in `info'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:334:in `status'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:407:in `completed?'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:118:in `block in all'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:117:in `map'
/var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb:117:in `all'
/var/www/ood/apps/sys/dashboard/app/controllers/batch_connect/sessions_controller.rb:7:in `index'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/abstract_controller/base.rb:194:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/rendering.rb:30:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/abstract_controller/callbacks.rb:42:in `block in process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/callbacks.rb:132:in `run_callbacks'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/abstract_controller/callbacks.rb:41:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/rescue.rb:22:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/notifications.rb:168:in `block in instrument'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/notifications/instrumenter.rb:23:in `instrument'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/notifications.rb:168:in `instrument'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/instrumentation.rb:32:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal/params_wrapper.rb:256:in `process_action'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/abstract_controller/base.rb:134:in `process'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionview-5.2.8/lib/action_view/rendering.rb:32:in `process'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal.rb:191:in `dispatch'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_controller/metal.rb:252:in `dispatch'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/routing/route_set.rb:52:in `dispatch'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/routing/route_set.rb:34:in `serve'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/journey/router.rb:52:in `block in serve'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/journey/router.rb:35:in `each'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/journey/router.rb:35:in `serve'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/routing/route_set.rb:840:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/tempfile_reaper.rb:15:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/etag.rb:27:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/conditional_get.rb:27:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/head.rb:12:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/http/content_security_policy.rb:18:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/session/abstract/id.rb:266:in `context'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/session/abstract/id.rb:260:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/cookies.rb:670:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/callbacks.rb:98:in `run_callbacks'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/callbacks.rb:26:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/debug_exceptions.rb:61:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/lograge-0.12.0/lib/lograge/rails_ext/rack/logger.rb:18:in `call_app'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/railties-5.2.8/lib/rails/rack/logger.rb:26:in `block in call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/tagged_logging.rb:71:in `block in tagged'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/tagged_logging.rb:28:in `tagged'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/tagged_logging.rb:71:in `tagged'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/railties-5.2.8/lib/rails/rack/logger.rb:26:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/request_store-1.5.1/lib/request_store/middleware.rb:19:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/request_id.rb:27:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/method_override.rb:24:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/runtime.rb:22:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/activesupport-5.2.8/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/actionpack-5.2.8/lib/action_dispatch/middleware/executor.rb:14:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/rack-2.2.3.1/lib/rack/sendfile.rb:110:in `call'
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/railties-5.2.8/lib/rails/engine.rb:524:in `call'
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:107:in `process_request'
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:149:in `accept_and_process_next_request'
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:110:in `main_loop'
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:419:in `block (3 levels) in start_threads'
/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception'

The contents of pre-post.yml are as follows.

---
v2:
  metadata:
    title: "Pre/Post"
  login:
    host: "ondemand-test.fugaku.r-ccs.riken.jp"
    default: true
  job:
    adapter: "slurm"
    bin: "/usr/bin/"
    conf: "/etc/slurm/slurm.conf"

Is the procedure correct when using multiple job schedulers?

Best,

You can search/grep /var/log/ondemand-nginx/$USER/error.log for execve for the actual squeue command you’re issuing. I believe it’s very long, requesting a lot of fields and specifying the fields separator. I don’t know what sort of compatibility you have between squeue and Fugaku’s scheduler - but you may have to play with it by hand to see what’s going on.

Here’s an example of a command I just pulled from my system. Obviously we’re using some unicode separator, not just any character.

App 46850 output: [2022-06-27 09:23:56 -0400 ]  INFO "execve = [{\"SLURM_CONF\"=>\"/etc/slurm/slurm.conf\"}, \"/usr/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"11804162\", \"-M\", \"pitzer\"]"

We think of schedulers as independent systems. Or at least independent cluster. Meaning, from our perspective, they have no relationship with each other.

I have created a special resource manager, fujitsu_tcs, which has been merged into the OOD master repository.

To support the resource manager, I developed one new file and modified two existing files.
Please refer to the pull request below for details.

Thanks,