DCV interactive app setup

rishabhs002 · June 6, 2025, 2:03pm

Hi,
I was setting up DCV as an interactive app in OOD. Here are my scripts-

Form.yml

---

attributes:

  cluster: "hpc-cluster-new"

  desktop: "dcv"

  cpu_cores:

    widget: select

    help: "CPU Cores for dcv session"

    options:

      - [ "vCPUs=1", "1" ]

      - [ "vCPUs=2", "2" ]

      - [ "vCPUs=4", "4" ]

      - [ "vCPUs=6", "6" ]

      - [ "vCPUs=8", "8" ]

    label: "CPU Cores"
 
  memory:

    widget: select

    help: "RAM"

    options:

      - [ "Memory=4GB", "4" ]

      - [ "Memory=8GB", "8" ]

      - [ "Memory=16GB", "16" ]

      - [ "Memory=32GB", "32" ]

    label: "Memory"
 
  gpu:

    widget: select

    help: "GPU"

    options:

      - [ "GPU=1", "1" ]

      - [ "GPU=2", "2" ]

      - [ "GPU=3", "3" ]

      - [ "GPU=4", "4" ]

    label: "GPU"
 
  session_timeout:

    widget: select

    options:

      - [ "5 minutes", "5m" ]

      - [ "1 hour",    "1h" ]

      - [ "2 hours",   "2h" ]

      - [ "4 hours",   "4h" ]

      - [ "1 day",     "1d" ]

      - [ "4 days",    "4d" ]

    label: "Session timeout"
 
form:

  - desktop

  - cpu_cores

  - memory

  - gpu

  - session_timeout

submit.yml.erb

---
cluster: "hpc-dev-cluster"
batch_connect:
  templates: "dcv"
script:
  job_name: "dcv"
  queue_name: "dcv"
  native:
    - "--exclusive"
    - "--cpus-per-task=<%= cpu_cores %>"
    - "--mem=<%= memory %>G"
    - "--gres=gpu:<%= gpu %>"
    - "--export"
    - "DCV_SESSION_TIMEOUT=<%= session_timeout %>"

I want the job to sleep for the specified duration, it was working earlier but it stopped working suddenly and the job goes into completed state in a few seconds and also there is no output file which i can examine for errors.

My before script is the default one, cleanup just removes a file,
after script creates the session and everything which is working fine, i verified I think the problem is my

script.sh.erb (intended to sleep for required time)

#!/bin/bash
  
# Change working directory to user's home directory

cd "${HOME}"
 
# Ensure that the user's configured login shell is used

export SHELL="$(getent passwd $USER | cut -d: -f7)"
 
declare -p >> dcv.log
 
# Start up desktop

echo "Launching desktop '<%= context.desktop %>'..." >> dcv.log

source "<%= session.staged_root.join("desktops", "#{context.desktop}.sh") %>" >> dcv.log

echo "Desktop '<%= context.desktop %>' ended..." >> dcv.log
 
if [ -n "${DCV_SESSION_TIMEOUT}" ]; then

    echo "Sleeping for session timeout of ${DCV_SESSION_TIMEOUT}, close in case of kills"

    # Convert session timeout to seconds (assumes format like "1 hour" or "60 minutes")

   # TIMEOUT_SECONDS=$(date -d "${DCV_SESSION_TIMEOUT}" +%s 2>/dev/null)

    if [ $? -eq 0 ]; then

        sleep ${DCV_SESSION_TIMEOUT} || {

            echo \"Sleep interrupted, closing session.\" >> dcv.log

            exit 1

        }

    else

        echo "Invalid session timeout format: ${DCV_SESSION_TIMEOUT}" >> dcv.log

        exit 1

    fi

fi

I want the job to keep running till the specified duration but its not. It was working fine earlier…

rishabhs002 · June 6, 2025, 3:57pm

Hi @ jeff.ohrstrom
could you help me with this please?

jeff.ohrstrom · June 6, 2025, 4:08pm

Why are you redirecting to dvc.log? Doesn’t your scheduler capture stdout and stderror in output.log?

Also not sure why you’re using so much logic to the script - can’t you just rely on the schedulers’ ability to delete the job after a specified time?

Lastly I’d wonder about a script blocking vs going into the background. If you launch processes in the background, the script will exit directly after it issues that command. The scheduler in turn believes the job is complete because the script has exited. So it’s quite important that whatever commands you issue run in the foreground and block the script to ensure the scheduler doesn’t stop the job.

rishabhs002 · June 7, 2025, 11:38am

I want it to remain in running state instead it goes in completed. That’s why I am using sleep for the requested time.

rishabhs002 · June 7, 2025, 11:41am

For a while it goes into starting, then completed.

rishabhs002 · June 7, 2025, 12:04pm

THis is my cluster configuration. I am using OKTA OIDC for authentication in my OOD portal.Maybe that’s the issue

---
v2:
  metadata:
    title: "HPC Cluster"
    url: "https://localhost"
  login:
    host: "localhost"
    user: "%{user}"
    default: true
    auth: "ssh"
  job:
    adapter: "slurm"
    cluster: "<name>"
    bin: "/usr/bin"
    strict_host_checking: false
    ssh:
      UsePAM: true
    auth: "password"
    forward_ssh_key: false
  batch_connect:
    basic:
      script_wrapper: |
        module purge
        %s

rishabhs002 · June 7, 2025, 12:28pm

my job is failing

sacct -j 14450 --format=JobID,JobName,Partition,State,ExitCode,Start,End

JobID           JobName  Partition      State ExitCode               Start                 End 
------------ ---------- ---------- ---------- -------- ------------------- ------------------- 
14450               dcv        dcv     FAILED      1:0 2025-06-07T05:13:41 2025-06-07T05:14:02 
14450.batch       batch                FAILED      1:0 2025-06-07T05:13:41 2025-06-07T05:14:02 
14450.extern     extern             COMPLETED      0:0 2025-06-07T05:13:41 2025-06-07T05:14:02

rishabhs002 · June 7, 2025, 4:58pm

I referred to this

github.com/aws-samples/openondemand-dcv

bc_desktop/template/script.sh.erb

main

#!/usr/bin/env bash

# Change working directory to user's home directory
cd "${HOME}"

# Ensure that the user's configured login shell is used
export SHELL="$(getent passwd $USER | cut -d: -f7)"

declare -p >> dcv.log

# Start up desktop
echo "Launching desktop '<%= context.desktop %>'..." >> dcv.log
source "<%= session.staged_root.join("desktops", "#{context.desktop}.sh") %>" >> dcv.log
echo "Desktop '<%= context.desktop %>' ended..." >> dcv.log

if [ -n "${DCV_SESSION_TIMEOUT}" ]; then
    echo "Sleeping for session timeout, close in case of kills"
    sleep ${DCV_SESSION_TIMEOUT} || ( dcv close-session $(basename ${PBS_O_WORKDIR}) ; touch .session_complete )
fi

rishabhs002 · June 8, 2025, 8:58am

another error in the portal..

Exception: OodApp::SetupScriptFailed

Per user setup failed for script at /var/www/ood/apps/sys/myjobs/./bin/setup-production for user ADM_rsinghal4 with output: /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/command.rb:2:in `<class:Thor>': superclass mismatch for class Command (TypeError)
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/command.rb:1:in `<top (required)>'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/base.rb:1:in `require_relative'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/base.rb:1:in `<top (required)>'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor.rb:1:in `require_relative'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor.rb:1:in `<top (required)>'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendored_thor.rb:8:in `require_relative'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/vendored_thor.rb:8:in `<top (required)>'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/friendly_errors.rb:3:in `require_relative'
  from /usr/local/share/gems/gems/bundler-2.5.5/lib/bundler/friendly_errors.rb:3:in `<top (required)>'
  from /usr/share/ruby/bundled_gems.rb:75:in `require'
  from /usr/share/ruby/bundled_gems.rb:75:in `block (2 levels) in replace_require'
  from /usr/local/share/gems/gems/bundler-2.5.5/exe/bundle:18:in `<top (required)>'
  from /var/www/ood/apps/sys/myjobs/bin/bundle:3:in `load'
  from /var/www/ood/apps/sys/myjobs/bin/bundle:3:in `<main>'

jeff.ohrstrom · June 9, 2025, 1:18pm

I’m not sure about the very last error, seems like some library issue, but I’ve never seen that before.

As to the job failing, still not seeing any output.log so I can’t tell where it’s failing. Clearly from slurmdb it’s exiting 1 somewhere, but without the output.log I can’t say where. Maybe using set -x somewhere will help?

Topic		Replies	Views
Unable to Specify Cores or Memory for Interactive Desktop Get Help	13	1107	May 26, 2022
NICE DCV integration Open OnDemand Get Help	4	1346	May 26, 2022
GPU-Enabled VirtualGL session for interactive desktops Get Help question	3	860	January 29, 2024
Create a batch connector for nicdcv Get Help	7	224	November 27, 2024
OOD 4.0 App not changing displayed cores Get Help question	3	62	January 23, 2025

DCV interactive app setup

Related topics