I’m setting up Open OnDemand 2.0.28-1 for the first time on one of our RHEL 7 clusters, with an SGE 8.1.9 scheduler.
I’ve got most of it working so far, but here’s the problem I’m having:
- in the Batch Connect interface, I can launch a job, no problem, it queues and runs,
- when it starts running, it shows “Host: Undetermined”,
- if I try to launch the noVNC desktop, it fails to connect, because the websocket URL does not contain the hostname, so e.g. the websocket URL looks like
wss://ood.ext.myriad.ucl.ac.uk/rnode//1343/websockify
, - finding out which node the job is running on and writing it into the noVNC URL successfully connects and gives me a desktop.
I’m not sure where to go looking for how this hostname should be determined: inferring from an older thread, I tried adding a set_host
key to my cluster definition:
batch_connect:
# This may need some tweaking to get right
# See: https://osc.github.io/ood-documentation/master/installation/cluster-config-schema.html#batch-connect
basic:
script_wrapper: "%s"
set_host: "host=$(hostname -a)"
vnc:
script_wrapper: "singularity exec -B /shared,/lustre,/run /shared/ucl/apps/siflib/centos-8s-quay.sif %s"
set_host: "host=$(hostname -a)"
And also in this /etc/ood/config/apps/bc_desktop/submit/container.yml.erb
setup I adapted from this thread: Mate Desktop in a Singularity container? - #40 by novosirj
---
script:
# customised, not sure about this yet
# the SGE adapter setup for this seems different to others?
# nodes: "<%= bc_num_slots %><%= node_type %>"
native:
- -pe
- smp
- <%= num_cores.blank? ? 1 : num_cores.to_i %>
- -l
- mem=<%= memory_gigs %>G
- -l
- tmpfs=<%= tmpfs_gigs %>G
template: "vnc"
batch_connect:
websockify_cmd: '/usr/bin/websockify'
header: |
#!/bin/bash -l
source ~/.bashrc
set_host: "host=$(hostname -a)"
script_wrapper: |
set -x
module purge
module load singularity-env
cat << "CTRSCRIPT" > container.sh
export PATH="$PATH:/opt/TurboVNC/bin"
%s
CTRSCRIPT
# customised
export SINGULARITY_BINDPATH="$HOME,/lustre,/scratch,/shared,/var,/run,/tmp,/tmpdir"
singularity run <%= image %> /bin/bash container.sh
And here’s the corresponding form.yml.erb
:
title: "Myriad Desktop"
cluster: "myriad"
submit: submit/container.yml.erb
attributes:
desktop: "mate"
bc_vnc_idle: 0
bc_vnc_resolution:
required: true
node_type: null
memory_gigs:
widget: "number_field"
label: "Gigabytes of memory"
value: 4
tmpfs_gigs:
widget: "number_field"
label: "Gigabytes of temporary on-node storage"
value: 10
form:
- bc_vnc_idle
- desktop
- bc_account
- bc_num_hours
- bc_num_slots
- num_cores
- memory_gigs
- tmpfs_gigs
- node_type
- bc_queue
- bc_vnc_resolution
- bc_email_on_started
But setting that set_host
field didn’t seem to change anything. As far as I can tell, the original definition of that is just hostname
anyway, from: /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/ood_core-0.21.0/lib/ood_core/batch_connect/template.rb
?
Any pointers would be appreciated: I’m guessing this would ideally be set somewhere in the SGE adapter, maybe as an exec_host
field to a job object somewhere, but I’m not entirely sure.