Hi,
I was setting up DCV as an interactive app in OOD. Here are my scripts-
Form.yml
---
attributes:
cluster: "hpc-cluster-new"
desktop: "dcv"
cpu_cores:
widget: select
help: "CPU Cores for dcv session"
options:
- [ "vCPUs=1", "1" ]
- [ "vCPUs=2", "2" ]
- [ "vCPUs=4", "4" ]
- [ "vCPUs=6", "6" ]
- [ "vCPUs=8", "8" ]
label: "CPU Cores"
memory:
widget: select
help: "RAM"
options:
- [ "Memory=4GB", "4" ]
- [ "Memory=8GB", "8" ]
- [ "Memory=16GB", "16" ]
- [ "Memory=32GB", "32" ]
label: "Memory"
gpu:
widget: select
help: "GPU"
options:
- [ "GPU=1", "1" ]
- [ "GPU=2", "2" ]
- [ "GPU=3", "3" ]
- [ "GPU=4", "4" ]
label: "GPU"
session_timeout:
widget: select
options:
- [ "5 minutes", "5m" ]
- [ "1 hour", "1h" ]
- [ "2 hours", "2h" ]
- [ "4 hours", "4h" ]
- [ "1 day", "1d" ]
- [ "4 days", "4d" ]
label: "Session timeout"
form:
- desktop
- cpu_cores
- memory
- gpu
- session_timeout
submit.yml.erb
---
cluster: "hpc-dev-cluster"
batch_connect:
templates: "dcv"
script:
job_name: "dcv"
queue_name: "dcv"
native:
- "--exclusive"
- "--cpus-per-task=<%= cpu_cores %>"
- "--mem=<%= memory %>G"
- "--gres=gpu:<%= gpu %>"
- "--export"
- "DCV_SESSION_TIMEOUT=<%= session_timeout %>"
I want the job to sleep for the specified duration, it was working earlier but it stopped working suddenly and the job goes into completed state in a few seconds and also there is no output file which i can examine for errors.
My before script is the default one, cleanup just removes a file,
after script creates the session and everything which is working fine, i verified I think the problem is my
script.sh.erb (intended to sleep for required time)
#!/bin/bash
# Change working directory to user's home directory
cd "${HOME}"
# Ensure that the user's configured login shell is used
export SHELL="$(getent passwd $USER | cut -d: -f7)"
declare -p >> dcv.log
# Start up desktop
echo "Launching desktop '<%= context.desktop %>'..." >> dcv.log
source "<%= session.staged_root.join("desktops", "#{context.desktop}.sh") %>" >> dcv.log
echo "Desktop '<%= context.desktop %>' ended..." >> dcv.log
if [ -n "${DCV_SESSION_TIMEOUT}" ]; then
echo "Sleeping for session timeout of ${DCV_SESSION_TIMEOUT}, close in case of kills"
# Convert session timeout to seconds (assumes format like "1 hour" or "60 minutes")
# TIMEOUT_SECONDS=$(date -d "${DCV_SESSION_TIMEOUT}" +%s 2>/dev/null)
if [ $? -eq 0 ]; then
sleep ${DCV_SESSION_TIMEOUT} || {
echo \"Sleep interrupted, closing session.\" >> dcv.log
exit 1
}
else
echo "Invalid session timeout format: ${DCV_SESSION_TIMEOUT}" >> dcv.log
exit 1
fi
fi
I want the job to keep running till the specified duration but its not. It was working fine earlier…