Customizing MATLAB form.yml and submit.yml.erb, simplifying default; different versions referenced

I’m trying to customize the form fields to our cluster and simplify it. All the extra node_type and cores_lookup make it a bit difficult to know what can be removed.

The Git page has several more options vs the tutorial page.

I can’t get the versions of MATLAB to appear in the dropdown.

EDIT: I did update form.yml:

form:
  - version

and at the bottom:

  version:
    widget: select
    label: "MATLAB version"
    help: "This defines the version of MATLAB you want to load."
    options:
      - [ "R2022a", "matlab/r2022a" ]
      - [ "R2021a", "matlab/r2021a" ]
      - [ "R2019b", "matlab/r2019b" ]
      - [ "R2018b", "matlab/r2018b" ]
      - [ "R2018a", "matlab/r2018a" ]

I do see in template/script.sh.erb this section where I had to hard code the latest version.

# Launch MATLAB
<%- if gpu -%>
#module load intel/16.0.3 virtualgl
#module load matlab/2022a
module list

The suggestion here says to

Look for json files in the /etc/reporting/modules directory.

module_file_dir: "/etc/reporting/modules"

That directory does not exist. I created it and the file.

I kind of like Utah’s simplified form but ran into several errors such as:
ERROR "ERROR: NoMethodError - undefined method ’ for nil:NilClass"`

I’m not following this section:

to any ERB files in ~/ondemand/dev/bc_my_center_matlab/template/* and ~/ondemand/dev/bc_my_center_matlab/submit.yml.erb . version allows the user to select what version of MATLAB they want to run, and the second value corresponds to OSC’s module names. Those MATLAB versions are not in the Git page just in the tutorial.

Also, I see these warnings after a restart where is it looking for these files?

App 403997 output: [2024-10-28 15:28:40 -0400 ]  WARN "File /axon.json is unreadable."
App 403997 output: [2024-10-28 15:28:40 -0400 ]  WARN "File /slurmdev.json is unreadable."

And lastly this warning:

WARN "Error opening MOTD at \nException: bad URI(is not URI?): nil

@jeff.ohrstrom I’ve attached the .json file is there something off with this?
slurmdev.json (22.1 KB)

Also, @jeff.ohrstrom where should module_file_dir go and what is the indent and/or section of the cluisters.d file that it should be placed?

I followed the suggestion here,

OOD_BC_DYNAMIC_JS=TRUE
OOD_MODULE_FILE_DIR: /share/modulesfiles

in /etc/ood/config/apps/dashboard/env, which did not exists and I had to create.

If you put the configuration in yaml (in the ondemand.d directory) it’s yaml, lowercase and no OOD prefix.

module_file_dir: /some/directory

If it’s in an environment file (env) then it’s an environment variable with different syntax, an uppercase key and the OOD prefix.

OOD_MODULE_FILE_DIR='/some/directory'

Yes this is what I have. What is the spacing/indent and in what section of the .yml file should it be in?

It’s not a YAML file in /etc/ood/config/clusters.d it’s a YAML file in /etc/ood/config/ondemand.d.

To simplify the form and to only use one cluster, I am editing the form.yml and submit.yml.erb from Utah. How is the sbatch command built using the Node type dropdown? Using ‘any node’ works with Utah and the default from OSC but all of the other options result in

App 867125 output: [2024-11-04 15:07:29 -0500 ] ERROR "ERROR: NoMethodError - undefined method `[]' for nil:NilClass"
App 867125 output: [2024-11-04 15:07:29 -0500 ]  INFO "execve = [\"git\", \"describe\", \"--always\", \"--tags\"]"
App 867125 output: [2024-11-04 15:07:29 -0500 ]  INFO "method=POST path=/pun/sys/dashboard/batch_connect/sys/bc_osc_matlab/session_contexts format=html controller=BatchConnect::SessionContextsController action=create status=200 allocations=26568 duration=27.62 view=13.71"

I would think this passes the -C option for constraint. Perhaps this error message could be a little more intuitive?

Also clicking the link to stage root directory breaks with:
Error occurred when attempting to access /pun/sys/dashboard/files/fs/home/myuser/ondemand/data/sys/dashboard/batch_connect/sys/bc_osc_matlab/output/12dfb7df-ceea-4bd3-ad9d-2c462d9aa5be

Cannot read file /home/myuser/ondemand/data/sys/dashboard/batch_connect/sys/bc_osc_matlab/output/12dfb7df-ceea-4bd3-ad9d-2c462d9aa5be

App 867125 output: [2024-11-04 15:11:08 -0500 ]  WARN "failed to determine mime type for file: /home/myuser/ondemand/data/sys/dashboard/batch_connect/sys/bc_osc_matlab/output/12dfb7df-ceea-4bd3-ad9d-2c462d9aa5be due to error not valid mimetype: cannot open `/home/myuser/ondemand/data/sys/dashboard/batch_connect/sys/bc_osc_matlab/output/12dfb7df-ceea-4bd3-ad9d-2c462d9aa5be' (No such file or directory)"

App 867125 output: [2024-11-04 15:11:08 -0500 ] ERROR "Cannot read file /home/myuser/ondemand/data/sys/dashboard/batch_connect/sys/bc_osc_matlab/output/12dfb7df-ceea-4bd3-ad9d-2c462d9aa5be"

Also from scontrol show job why does this show as 2 CPUs and 14 GB memory when only 1 CPU option was selected in the form?

   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=14G,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=(null)
     Nodes=ax08 CPU_IDs=0-1 Mem=14336 GRES=

user_defined_context.json shows:

  "auto_modules_matlab": "matlab/2022a",
  "auto_accounts": "admin",
  "bc_num_hours": "1",
  "num_cores": "1",
  "node_type": "any",
  "bc_email_on_started": "0"

I would suggest instead of copying things directly you start with the most basic thing that works with defaults (think of sbatch -A account and that’s it) then start to add support for additional fields like node_type.

Start with something that works well then build on that. Otherwise you’ll run into errors like that where I can’t even begin to figure out what’s what because all the edits that I can’t see.

Not sure why this could happen but the error message is clear - the folder doesn’t exist. Or at least it didn’t at that time you clicked the link. Maybe there’s an NFS propogation latency?

Don’t know off the top, but it shows 1 tasks and 1 CPU/Task. Maybe there’s something on the Slurm side that enforced a minimum of 2 cores.

Great are there some examples of a minimum config or do I just comment out all these form options?

It’s your submit.yml that’s giving you issues.

Start with this.

<%-
  slurm_args = []
%>
---
batch_connect:
  template: vnc
script:
  native:
  <%- slurm_args.each do |arg| %>
    - "<%= arg %>"
  <%- end %>

Then, one by one, add them to the slurm_args array. Even with this, there are no arguments from your form being used here. One by one, add them to the array like so:

slurm_args = ["--ntasks-per-node", num_cores.to_i]