Double CPU gets allocated

Omnia · August 2, 2024, 2:15pm

Hi , i am trying now for a while to figure out what i am missing but i can’t find it.

My submit.yml.erb looks like this:

---
batch_connect:
  template: basic
script:
  queue_name: <%= custom_queue %>
  native:
    - "--nodes"
    - "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
    - "--ntasks"
    - "<%= num_ntasks.blank? ? 1 : num_ntasks.to_i %>"
    - "--cpus-per-task"
    - "<%= num_cpus.blank? ? 1 : num_cpus.to_i %>"
    - "--mem"
    - "<%= num_mem.blank? ? 1 : num_mem.to_i %>G"
    <%- unless nodelist.blank? -%>
    - "--nodelist"
    - "<%= nodelist %>"
    <%- end -%>

    <%- unless email.blank? -%>
    - "--mail-user"
    - "<%= email %>"
    - "--mail-type"
    - "BEGIN,END,FAIL"
    <%- end -%>

    - "--hint"
    - "nomultithread"

    <%- if num_gpus.to_i >0 -%>
    - "--gpus-per-node"
    - "<%= num_gpus.to_i %>"
    <%- end -%>

    <%- if custom_queue.to_s == "backfill" -%>
    - "--requeue"
    <%- end -%>

and my form.yml.erb looks like that:

<%-
require 'open3'
require 'json'
begin
    # read partitions.json and get patitions value
    file = File.read(File.join(__dir__, 'partitions.json'))
    partitions_hash = JSON.parse(file)
    # Command to Run
    script = 'sinfo -h --format="%P"'
    # Create a partitions array to dynamically populate the queues associated with the user
    partitions = []
    # Store the output, error, status
    output, status = Open3.capture2('bash', stdin_data: script)
    # puts status
    if status.success?
        # Add it to the custom_envs array by splitting the output at '\n'.
        output.split("\n").each do |queue|
            queue = queue.gsub("*", "")
            if partitions_hash.has_key?(queue)
                partitions.push(queue)
            end
        end
        puts partitions
    else
        partition_error = "Error"
    end
rescue => e
    partition_error = e.message.strip
end
-%>


# Batch Connect app configuration file
#
# @note Used to define the submitted cluster, title, description, and
#   hard-coded/user-defined attributes that make up this Batch Connect app.
---

# **MUST** set cluster id here that matches cluster configuration file located
# under /etc/ood/config/clusters.d/*.yml
# @example Use the Owens cluster at Ohio Supercomputer Center
#     cluster: "owens"
cluster: "omnia"

# Define attribute values that aren't meant to be modified by the user within
# the Dashboard form
cluster: "omnia"
form:
   - custom_queue
   - mode
   - profiles
   - working_dir
   - num_cpus
   - ntasks
   - gpu_options
   - num_gpus
   - num_mem
   - nodelist
   - bc_account
   - bc_num_slots
   - bc_num_hours
   - sub_type
   - bind_paths
   - version
   - sif_file
   - conda_path
   - conda_name
  #- extra_jupyter_args
   - email

submit: submit.yml.erb
id : submitForm
name: submitForm
title: Jupyter Lab

attributes:
  bc_num_slots:
    label: "Number of nodes"
  # Set the corresponding modules that need to be loaded for Jupyter to run
  #
  # @note It is called within the batch job as `module load <modules>` if
  #   defined
  # @example Do not load any modules
  #     modules: ""
  # @example Using default python module
  #     modules: "python"
  # @example Using specific python module
  #     modules: "python/3.5"
  # @example Using combination of modules
  #     modules: "python/3.5 cuda/8.0.44"

  custom_queue:
    label: Partition
    widget: select
    value: interactive
    cacheable: false
    help: |
      - [Partition Documentation](http://hpc-docs.iee.fraunhofer.de/partitions/)
    <%- if partition_error || partitions.blank?-%>
      <div class="text-danger">Error while fetching Partition. Please contact support!</div>
    <%- else -%>
    options:
    <%- partitions.each do |q| -%>
      - [
          "<%= q %>", "<%= q %>",
          <%= JSON.generate(partitions_hash[q]) %>
        ]
    <%- end -%>
    <%- end -%>

  # working_dir:
  #   label: Working Directory
  #   help: "Optionally select your Jupyter project directory The default is your home directory ($HOME) when left empty."
  #   cacheable: false
  #   data-filepicker: true
  #   data-target-file-type: dirs
  #   readonly: false

# Working_dir
  working_dir:
    widget: "path_selector"
    directory: "/mnt/"
    show_hidden: true
    show_files: true
    favorites: false
    help: |
    #     "Optionally select your Jupyter project directory The default is your home directory ($HOME) when left empty."

  mode:
    label: Mode
    help: "Choose between simple pre-defined profiles or advanced self configuration"
    widget: select
    options:
      - [ "simple", "x",
          data-hide-gpu-options: true,
          data-hide-num-cpus: true,
          data-hide-num-mem: true,
          data-hide-num-gpus: true
        ]
      - [ "advanced", "y",
          data-hide-profiles: true
        ]

  profiles:
    label: Mode
    help: "Choose a profile"
    widget: select
    options:
      - [ "4 CPU, 8GB RAM", "4c8r",
        
        ]
      - [ "8 CPU, 16GB RAM", "8c16r",
        
        ] 
      - [ "16 CPU, 32GB RAM", "16c32r",
        
        ]


  sub_type:
    label: Submission Environment
    widget: select
    help : Select desired submission environment.
    cacheable: false
    options:
      - [
          "Mod (Basic)", "mod_basic",
          data-hide-sif-file: true,
          data-hide-conda-path: true,
          data-hide-conda-name: true,
          data-hide-bind-paths: true
        ]
      - [
          "Apptainer Container", "sif_basic",
          data-hide-conda-path: true,
          data-hide-version: true,
          data-hide-conda-name: true
        ]
      - [
          "Custom Conda Enviroment", "conda_env",
          data-hide-sif-file: true,
          data-hide-version: true,
          data-hide-bind-paths: true
        ]   

  version:
    label: Jupyter Kernel
    help: Select the desired Jupyter Kernel
    cacheable: false
    widget: select
    options:
      - [ "Python 3.10 Kernel", "py310" ]
      - [ "Tensorflow 2.12", "tensorflow/2.12"]
      - [ "Pytorch 2.0", "pytorch/2.0"]

  # sif_file:
  #   label: Container File
  #   help: |
  #     Select an Apptainer/Singularity Container **(.sif/.simg) that includes Jupyterlab**. This is required when using a Container Submission Environment! **pip install jupyterlab** inside your container.
  #   cacheable: false
  #   data-filepicker: true
  #   data-target-file-type: files
  #   data-target-file-pattern: '(.simg|.sif)$'
  #   readonly: true

# Containers
  sif_file:
    widget: "path_selector"
    directory: "/mnt/"
    show_hidden: true
    show_files: true
    favorites: false
    help: |
    #     Select an Apptainer/Singularity Container **(.sif/.simg) that includes Jupyterlab**. This is required when using a Container Submission Environment! **pip install jupyterlab** inside your container.

  bind_paths:
    widget: "text_field"
    label: "Folders to bind into your container"
    value: ""
    help: |
      - Bind additional directories, default: $HOME
      - example: /data:/mnt binds your /data/ folder to /mnt inside the container

  # conda_path:
  #   label: Conda Enviroment
  #   help: |
  #     Select a Conda Enviroment : path to the **/bin folder that includes Jupyterlab**. This is required when using a Container Submission Environment! Use **pip install jupyterlab** inside your conda enviroment.
  #   cacheable: false
  #   data-filepicker: true
  #   data-show-hidden: true
  #   data-target-file-type: dirs
  #   data-target-file-pattern: 'bin'
  #   readonly: true
  #   initialdir: $HOME

  conda_path:
    widget: "path_selector"
    directory: "/mnt/"
    show_hidden: true
    show_files: true
    favorites: false
    help: |
    #     Select a Conda Enviroment : path to the **/bin folder that includes Jupyterlab**. This is required when using a Container Submission Environment! Use **pip install jupyterlab** inside your conda enviroment.

  conda_name:
    widget: "text_field"
    label: "Name of your conda Env"
    value: "envXY"
    help: |
      - Name of your custom conda enviroment **required !**
      
  num_cpus:
    label: CPUs (Cores)
    cacheable: false
    widget: number_field
    max: 16
    min: 1
    step: 1
    value: 1

  num_ntasks:
    label: Number of tasks
    cacheable: false
    widget: number_field
    min: 1
    step: 1
    value: 1

#GPU Options

  gpu_options:
    widget: select
    label: "GPU type"
    help: "Choose between Full GPU use and MIG Mode"
    options:
      - [
        "No GPU", "0",
         data-hide-num-gpus: true
        ]
      - ["A100 40GB (1 GPU)", "a100"]
   

  jupyter_type:
    widget: select
    label: "Jupyter Session Type"
    help: "Choose between Jupyter Notebook and Jupyter Lab"
    options:
      - ["Jupyter Notebook", "jupyter notebook"]
      - ["Jupyter Lab", "jupyter-lab"]
  

#Number of GPU's

  num_gpus:
    label: "GPUs"
    help: "Number of GPU or Cuda devices"
    cacheable: false
    widget: number_field
    max: 2
    min: 0
    step: 1
    value: 0

#Memory allocation
  num_mem:
    label: Memory (GB/Gigabytes)
    help: Total memory is available to all assigned threads!
    cacheable: false
    widget: number_field
    max: 64
    min: 4
    step: 1
    value: 4
    
  email:
    label: Email Address
    help: |
      Enter your email address if you would like to receive job notifications (start, finished, failed, etc..).  Otherwise, leave the field empty.

When i start a jupyter notebook from the UI and choose 4 CPU’s for example it will allocate 8 while the content of the job_script_options.json shows the correct amount of 4:

native	
0	"--nodes"
1	"1"
2	"--ntasks"
3	"1"
4	"--cpus-per-task"
5	"4"
6	"--mem"
7	"16G"
8	"--mail-user"
9	"xxxx@xxxx"
10	"--mail-type"
11	"BEGIN,END,FAIL"
12	"--hint"
13	"nomultithread"
wall_time	3600
queue_name	"progress"

And scontrol show jobs shows:

ReqTRES=cpu=4,mem=4G,node=1,billing=4
** AllocTRES=cpu=8,mem=4G,node=1,billing=8**

Does somebody maybe know why this is happening?

Kind Regards
Kreefd

jeff.ohrstrom · August 2, 2024, 2:26pm

I wonder if this hint is throwing you off.

Also - I found these on stack overflow. Maybe it’s a setting in the queue you’re being dropped in or a cluster wide configuration that’s off.

mjbludwig · August 12, 2024, 9:06pm

Do the compute nodes have Hyperthreading turned on? If so, do you have the nodes in slurm configured for HT?

Omnia · August 19, 2024, 1:15pm

Blockquote
I wonder if this hint is throwing you off.
Also - I found these on stack overflow. Maybe it’s a setting in the queue you’re being dropped in or a cluster wide configuration that’s off.

Thank you , yes that were the lines which were causing the double allocation.

system · February 15, 2025, 1:16pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OOD 4.0 web portal not listing allocated resources Get Help	2	26	February 7, 2025
Adding a conditional option in submit.yml.erb Get Help	2	96	January 26, 2025
Batch connect erb template `eval` strange behavior Get Help	3	132	July 14, 2024
Bc_desktop submit.yml.erb Native bug i think Get Help question	3	711	May 26, 2022
Jupyter custom variables not working Get Help	2	490	May 26, 2022

Double CPU gets allocated

Related topics