Host undetermined in Batch Connect, noVNC

I’m setting up Open OnDemand 2.0.28-1 for the first time on one of our RHEL 7 clusters, with an SGE 8.1.9 scheduler.

I’ve got most of it working so far, but here’s the problem I’m having:

  • in the Batch Connect interface, I can launch a job, no problem, it queues and runs,
  • when it starts running, it shows “Host: Undetermined”,
  • if I try to launch the noVNC desktop, it fails to connect, because the websocket URL does not contain the hostname, so e.g. the websocket URL looks like wss://ood.ext.myriad.ucl.ac.uk/rnode//1343/websockify,
  • finding out which node the job is running on and writing it into the noVNC URL successfully connects and gives me a desktop.

I’m not sure where to go looking for how this hostname should be determined: inferring from an older thread, I tried adding a set_host key to my cluster definition:

  batch_connect:
      # This may need some tweaking to get right
      # See: https://osc.github.io/ood-documentation/master/installation/cluster-config-schema.html#batch-connect
      basic:
        script_wrapper: "%s"
        set_host: "host=$(hostname -a)"
      vnc:
        script_wrapper: "singularity exec -B /shared,/lustre,/run /shared/ucl/apps/siflib/centos-8s-quay.sif %s"
        set_host: "host=$(hostname -a)"

And also in this /etc/ood/config/apps/bc_desktop/submit/container.yml.erb setup I adapted from this thread: Mate Desktop in a Singularity container? - #40 by novosirj

---
script:
  # customised, not sure about this yet
  # the SGE adapter setup for this seems different to others?
  #  nodes: "<%= bc_num_slots %><%= node_type %>"
  native:
     - -pe
     - smp
     - <%= num_cores.blank? ? 1 : num_cores.to_i %>
     - -l
     - mem=<%= memory_gigs %>G
     - -l
     - tmpfs=<%= tmpfs_gigs %>G
  template: "vnc"
batch_connect:
  websockify_cmd: '/usr/bin/websockify'
  header: |
    #!/bin/bash -l
    source ~/.bashrc
  set_host: "host=$(hostname -a)"
  script_wrapper: |
    set -x
    module purge
    module load singularity-env
    cat << "CTRSCRIPT" > container.sh
    export PATH="$PATH:/opt/TurboVNC/bin"
    %s
    CTRSCRIPT

    # customised
    export SINGULARITY_BINDPATH="$HOME,/lustre,/scratch,/shared,/var,/run,/tmp,/tmpdir"

    singularity run <%= image %> /bin/bash container.sh

And here’s the corresponding form.yml.erb:

title: "Myriad Desktop"
cluster: "myriad"
submit: submit/container.yml.erb
attributes:
  desktop: "mate"
  bc_vnc_idle: 0
  bc_vnc_resolution:
    required: true
  node_type: null

  memory_gigs:
    widget: "number_field"
    label: "Gigabytes of memory"
    value: 4
  tmpfs_gigs:
    widget: "number_field"
    label: "Gigabytes of temporary on-node storage"
    value: 10

form:
  - bc_vnc_idle
  - desktop
  - bc_account
  - bc_num_hours
  - bc_num_slots
  - num_cores
  - memory_gigs
  - tmpfs_gigs
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started

But setting that set_host field didn’t seem to change anything. As far as I can tell, the original definition of that is just hostname anyway, from: /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/ood_core-0.21.0/lib/ood_core/batch_connect/template.rb ?

Any pointers would be appreciated: I’m guessing this would ideally be set somewhere in the SGE adapter, maybe as an exec_host field to a job object somewhere, but I’m not entirely sure.

Hi Ian.

Thanks for your post.

I will see what I can find out for you.

Thanks,
-gerald

Some trawling back through from the “Undetermined” string in the i18n files to the session.rb file in the batch_connect classes pointed me in the direction of generated connection.yml files.

Looking at those files for jobs I’ve submitted has shown that they have an empty host field, and if I fix the file manually, the noVNC sessions work from the web interface.

So I worked out how those files are generated… and it turned out:

  • hostname -a returns a blank string on our cluster nodes
  • hostname wasn’t even installed in the Singularity image (thanks, default Centos 8 Stream image)

I’ve changed the set_host fields to host=$(hostname -s) everywhere, and added hostname to the Singularity image, and now getting the host seems to be working and the noVNC sessions start without any manual steps.

:+1: