Globally defined queues in cluster config and LSF 10

I try to follow the doc to create a “Global Static List” of queues in the cluster configuration.
We use LSF 10.2.0.9, ondemand 2.0.29 and not multiple clusters.

Since we have only one cluster there is only one file under /etc/ood/config/clusters.d/rivm_hpc.yml
It is my understanding that the name of this file defines a cluster “rivm_hpc”, it should not be necessary to include the line

  job:
     cluster: "something"

however, when I omit this line I cant access the cluster-configuration in the form.yml.erb:
queues = OodAppkit.clusters[:rivm_hpc].custom_config[:queues]
the result is a nil (and consequently exceptions when iterating over it)

When I do include a cluster-name in the cluster-config, I see the defined queues.
But then the job submit will not work: A parameter is added to the bsub command:
bsub -m rivm_hpc
this leads to:
rivm_hpc: Bad host name, host group name or cluster name. Job not submitted.

When I remove the cluster name from the config and define the queues locally in the form.yml.erb I can submit (without the parameter -m).

Schouldnt OodAppkit.clusters[:rivm_hpc] always give me access to the configuration regardless of the cluster name in the cluster-config?

I cannot replicate. Your understanding of the issue is correct. One thing (the job.cluster configuration`) shouldn’t have anything to do with the other (the name of the cluster).

I tried to replicate with this file

---
v2:
  metadata:
    title: "LSF Test"
    url: "https://www.osc.edu/supercomputing/computing/owens"
    hidden: false
  login:
    host: "owens.osc.edu"
  job:
    adapter: "lsf"
  custom:
    queues:
      - a
      - b
      - c

This form


clusters:
  - 'lsf_test'

form:
  - custom_queue

attributes:
  custom_queue:
    widget: 'select'
    options:
      <%- OodAppkit.clusters[:lsf_test].custom_config[:queues].each do |queue| -%>
        - "<%= queue %>"
      <%- end -%>

And got this dropdown menu:

image

There’s likely something we’re missing here, something else that’s throwing this off. I do know for sure in 2.0, you can’t use queue as a field name.

Can you share the form.yml.erb you’re using with the custom_config setting?

my cluster definition:

# /etc/ood/config/clusters.d/rivm_hpc.yml
---
v2:
  metadata:
    title: "RIVM HPC"
  login:
    host: "rivm-biohn-l01a.rivm.ssc-campus.nl"
    default: true
  job:
    adapter: "lsf"
#  cluster: "rivm_hpc"
    [...]
  custom:
    queues:
       - ["bio-prio","Priority (<1h)"]
       - ["bio","Bio"]

my form.yml:

<%-
  require 'open3'

  #why doesnt it work?
  global_queues = OodAppkit.clusters[:rivm_hpc].custom_config[:queues]

  #experimental
  #user_queues=Open3.capture3( "bash -c /opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/bin/bqueues -o 'QUEUE_NAME' -noheader") 
  #queues = global_queues.select { |q| user_queues.include?(q[0]) }
-%>
---
cluster: "rivm_hpc"
attributes:
  container_version: "ood_jupyter_datascience-notebook:python-3.10.9.sif"
  custom_label:
    label: Label
    help: ""
    widget: "text_field"
    required: false
    cachable: false
  custom_queue:
    widget: "select"
    label: Queue
    help: Select one of the globally defined queues
    options:
    <%- global_queues.each do |q,name| -%>
      - [ "<%= name %>","<%= q %>" ]
    <%- end -%>
  
form:
  - custom_label
  - custom_queue
  - container_version

There’s something else going on here. I’m quite sure that adding cluster to that rivm_hpc.yml will not impact this functionality.

One thing here that could be affecting you is caching. Be sure to restart your webserver every time you edit your cluster.d file otherwise it’ll continue to use the cached version.

When I debug, I like to raise Standard Errors to inspect what’s going on

<%-
  # see what this object looks like
  raise StandardError, OodAppkit.clusters[:rivm_hpc].custom_config[:queues]

  # see what this cluster object looks like
  raise StandardError, OodAppkit.clusters[:rivm_hpc].inspect
-%>
---

I got this to work straight away and indeed see

and inspecting the cluster object, we can see

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.