Problems trying to build from source using ansible role

jeff.ohrstrom · May 11, 2022, 11:18pm

I appreciate your patience on this, you’re likely the first user ever to see that page as I just changed it last week. Sorry for the all the issues.

dtenenba · May 11, 2022, 11:38pm

Not a problem. I am also a n00b so I expect issues, and most of them have to do with my lack of knowledge.

And of course I have more questions. I configured my cluster with:

# /etc/ood/config/clusters.d/my_cluster.yml
---
v2:
  metadata:
    title: "My Cluster"
  login:
    host: "ec2-[redacted].us-west-2.compute.amazonaws.com"
  job:
    adapter: "slurm"
    cluster: "my_cluster"
    bin: "/opt/slurm/bin/"
    conf: "/opt/slurm/etc/slurm.conf"
    # bin_overrides:
      # sbatch: "/usr/local/bin/sbatch"
      # squeue: ""
      # scontrol: ""
      # scancel: ""

I verified I could ssh (as ubuntu, the user I am logged in as ) to the node defined there (also, since this is AWS I tried both the public and private hostname; they both work) . Incidentally, this is the same node where OOD is running. The bin and conf paths are correct. I re-ran /opt/ood/ood-portal-generator/sbin/update_ood_portal, I bounced apache, restarted my PUN from the help menu.
But I can’t do anything related to my cluster. Jobs do not show up in the jobs page even though I kicked off a slurm job manually and see it with squeue. When I try and run a shell or desktop I get errors.

I did find this in /var/log/ondemand-nginx/ubuntu/error.log:

App 17845 output: [2022-05-11 23:21:18 +0000 ] FATAL "ActionController::InvalidAuthenticityToken (ActionController::InvalidAuthenticityToken):"

How do I fix this?

dtenenba · May 11, 2022, 11:46pm

Also I now see this:

App 19034 output: [2022-05-11 23:41:27 +0000 ] ERROR "OodCore::JobAdapterError: squeue: error: Problem talking to database\nsqueue: error: 'my_cluster' can't be reached now, or it is an invalid entry for --clus

I have to figure out the name of the cluster I spun up with AWS ParallelCluster. That’s probably the problem - one of them anyway.

jeff.ohrstrom · May 11, 2022, 11:56pm

You either need to enable SSL in Apache or to follow the instructions here (if you read the develop SSL docs they say FIXME-LINK-NEEDED which would say something similar).

github.com/OSC/ondemand

HTTP 422 Error / InvalidAuthenticityToken

opened 09:43AM - 31 May 21 UTC

closed 02:38PM - 17 Jun 21 UTC

GloktarFR

bug

After a fresh installation of OOD-2.0.9 on CentOS-7.9, I'm unable to use batch c…onnect applications. The navigation through the dashboard (file explorer, shell, etc.) is working well though. But every time I try to submit a job in an interactive session, I'm getting a HTTP 422 Error. In the /var/log/ondemand-nginx/<user>/error.log, I'm seeing this error message: ``` App 19736 output: [2021-05-31 09:51:01 +0200 ] WARN "Can't verify CSRF token authenticity." App 19736 output: [2021-05-31 09:51:01 +0200 ] INFO "method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_desktop_3d/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=422 error='ActionController::InvalidAuthenticityToken: ActionController::InvalidAuthenticityToken' duration=0.73 view=0.00" App 19736 output: [2021-05-31 09:51:01 +0200 ] FATAL "" App 19736 output: [2021-05-31 09:51:01 +0200 ] FATAL "ActionController::InvalidAuthenticityToken (ActionController::InvalidAuthenticityToken):" App 19736 output: [2021-05-31 09:51:01 +0200 ] FATAL "" App 19736 output: [2021-05-31 09:51:01 +0200 ] FATAL "actionpack (5.2.6) lib/action_controller/metal/request_forgery_protection.rb:215:in `handle_unverified_request'\nactionpack (5.2.6) lib/action_controller/metal/request_forgery_protection.rb:247:in `handle_unverified_request'\nactionpack (5.2.6) lib/action_controller/metal/request_forgery_protection.rb:242:in `verify_authenticity_token'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:426:in `block in make_lambda'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:198:in `block (2 levels) in halting'\nactionpack (5.2.6) lib/abstract_controller/callbacks.rb:34:in `block (2 levels) in <module:Callbacks>'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:199:in `block in halting'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:513:in `block in invoke_before'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:513:in `each'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:513:in `invoke_before'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:131:in `run_callbacks'\nactionpack (5.2.6) lib/abstract_controller/callbacks.rb:41:in `process_action'\nactionpack (5.2.6) lib/action_controller/metal/rescue.rb:22:in `process_action'\nactionpack (5.2.6) lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\nactivesupport (5.2.6) lib/active_support/notifications.rb:168:in `block in instrument'\nactivesupport (5.2.6) lib/active_support/notifications/instrumenter.rb:23:in `instrument'\nactivesupport (5.2.6) lib/active_support/notifications.rb:168:in `instrument'\nactionpack (5.2.6) lib/action_controller/metal/instrumentation.rb:32:in `process_action'\nactionpack (5.2.6) lib/action_controller/metal/params_wrapper.rb:256:in `process_action'\nactionpack (5.2.6) lib/abstract_controller/base.rb:134:in `process'\nactionview (5.2.6) lib/action_view/rendering.rb:32:in `process'\nactionpack (5.2.6) lib/action_controller/metal.rb:191:in `dispatch'\nactionpack (5.2.6) lib/action_controller/metal.rb:252:in `dispatch'\nactionpack (5.2.6) lib/action_dispatch/routing/route_set.rb:52:in `dispatch'\nactionpack (5.2.6) lib/action_dispatch/routing/route_set.rb:34:in `serve'\nactionpack (5.2.6) lib/action_dispatch/journey/router.rb:52:in `block in serve'\nactionpack (5.2.6) lib/action_dispatch/journey/router.rb:35:in `each'\nactionpack (5.2.6) lib/action_dispatch/journey/router.rb:35:in `serve'\nactionpack (5.2.6) lib/action_dispatch/routing/route_set.rb:840:in `call'\nrack (2.2.3) lib/rack/tempfile_reaper.rb:15:in `call'\nrack (2.2.3) lib/rack/etag.rb:27:in `call'\nrack (2.2.3) lib/rack/conditional_get.rb:40:in `call'\nrack (2.2.3) lib/rack/head.rb:12:in `call'\nactionpack (5.2.6) lib/action_dispatch/http/content_security_policy.rb:18:in `call'\nrack (2.2.3) lib/rack/session/abstract/id.rb:266:in `context'\nrack (2.2.3) lib/rack/session/abstract/id.rb:260:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/cookies.rb:670:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'\nactivesupport (5.2.6) lib/active_support/callbacks.rb:98:in `run_callbacks'\nactionpack (5.2.6) lib/action_dispatch/middleware/callbacks.rb:26:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/debug_exceptions.rb:61:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'\nlograge (0.11.2) lib/lograge/rails_ext/rack/logger.rb:15:in `call_app'\nrailties (5.2.6) lib/rails/rack/logger.rb:26:in `block in call'\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:71:in `block in tagged'\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:28:in `tagged'\nactivesupport (5.2.6) lib/active_support/tagged_logging.rb:71:in `tagged'\nrailties (5.2.6) lib/rails/rack/logger.rb:26:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/remote_ip.rb:81:in `call'\nrequest_store (1.5.0) lib/request_store/middleware.rb:19:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/request_id.rb:27:in `call'\nrack (2.2.3) lib/rack/method_override.rb:24:in `call'\nrack (2.2.3) lib/rack/runtime.rb:22:in `call'\nactivesupport (5.2.6) lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'\nactionpack (5.2.6) lib/action_dispatch/middleware/executor.rb:14:in `call'\nrack (2.2.3) lib/rack/sendfile.rb:110:in `call'\nrailties (5.2.6) lib/rails/engine.rb:524:in `call'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:107:in `process_request'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:157:in `accept_and_process_next_request'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:110:in `main_loop'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:416:in `block (3 levels) in start_threads'\n/opt/rh/ondemand/root/usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception'" ``` Any idea what's causing this ? I have this exact same installation with ondemand-1.8.20 and it's working fine.

That’s good you spotted where to look though. If you notice execve in the same logs you can actually see the commands being issued.

dtenenba · May 12, 2022, 12:14am

OK, I will set up SSL, just being lazy.

But as for the cluster issue, it appears I need to provide a cluster name. I spun up this cluster using AWS ParallelCluster and it looks like it did not create a federated cluster. The cluster name is parallelcluster but when I submit a job with -M parallelcluster or --cluster parallelcluster I get an error:

sbatch: error: Problem talking to database
sbatch: error: 'parallelcluster' can't be reached now, or it is an invalid entry for --cluster.  Use 'sacctmgr list clusters' to see available clusters.

And the suggested command returns:

You are not running a supported accounting_storage plugin
Only 'accounting_storage/slurmdbd' is supported.

Not sure what’s involved in setting that up. So anyway, I get these errors manually, and also in the OOD logs. I’ll try writing some wrapper scripts that swallow the -M and --cluster options, unless you have another idea.

Thanks.

dtenenba · May 12, 2022, 12:16am

OK, I will set up SSL, just being lazy.

But as for the cluster issue, it appears I need to provide a cluster name. I spun up this cluster using AWS ParallelCluster and it looks like it did not create a federated cluster. I can submit jobs from a terminal session just fine using sbatch. The cluster name is parallelcluster but when I submit a job with -M parallelcluster or --cluster parallelcluster I get an error:

sbatch: error: Problem talking to database
sbatch: error: 'parallelcluster' can't be reached now, or it is an invalid entry for --cluster.  Use 'sacctmgr list clusters' to see available clusters.

And the suggested command returns:

You are not running a supported accounting_storage plugin
Only 'accounting_storage/slurmdbd' is supported.

Not sure what’s involved in setting that up. So anyway, I get these errors manually, and also in the OOD logs. I’ll try writing some wrapper scripts that swallow the -M and --cluster options, unless you have another idea.

Thanks.

jeff.ohrstrom · May 12, 2022, 12:57pm

First find out what works in the CLI. I take it -M and --cluster don’t.

When you configured Slurm, you must have set the cluster attribute. That’s why it’s passing the -M flag. Get rid of the cluster attribute in the YAML configuration and it won’t pass the flag anymore.

https://osc.github.io/ood-documentation/latest/installation/resource-manager/slurm.html

dtenenba · May 12, 2022, 4:24pm

Thanks, but then I get an error when trying to start an interactive desktop:

But that’s ok, I think my workaround is working so far…

jeff.ohrstrom · May 12, 2022, 5:01pm

The issue you’ve linked there is a new and specific to OOD. You’ve defined clusters in /etc/ood/config/clusters.d/. Let’s imagine you have this Slurm cluster as /etc/ood/config/clusters.d/cool_slurm_cluster.yml and you’re using Kubernetes in AWS with /etc/ood/config/clusters.d/cooler_k8s_cluster.yml.

That app is expected you to specify cool_slurm_cluster or cooler_k8s_cluster in the form.yml file.

jeff.ohrstrom · May 12, 2022, 5:03pm

Apps can submit to 1 or more of you’re defined clusters (and heterogeneously too, we run our apps on Slurm and Kuberentes both. The user chooses) - but you have to tell our apps which cluster you’ve defined in clusters.d to submit to.

dtenenba · May 12, 2022, 5:31pm

Got it, I followed the instructions and was able to enter the cluster name in the form and submit.

system · November 8, 2022, 5:32pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting started - non RedHat shop - Docker? Get Help question	15	406	November 2, 2022
OOD On Ubuntu 20.04 Get Help	4	2117	August 12, 2022
Ansible requirements.yml: ERROR! Expected role dependencies to be a list General Discussion question	3	941	June 21, 2023
Open OnDemand 3.0 Get Help question	2	825	December 4, 2023
Job composer problem after update to ansible role 3.0 Get Help question	4	262	April 26, 2024

Problems trying to build from source using ansible role

Related topics