Host undetermined in Batch Connect, noVNC

I’m setting up Open OnDemand 2.0.28-1 for the first time on one of our RHEL 7 clusters, with an SGE 8.1.9 scheduler.

I’ve got most of it working so far, but here’s the problem I’m having:

  • in the Batch Connect interface, I can launch a job, no problem, it queues and runs,
  • when it starts running, it shows “Host: Undetermined”,
  • if I try to launch the noVNC desktop, it fails to connect, because the websocket URL does not contain the hostname, so e.g. the websocket URL looks like wss://,
  • finding out which node the job is running on and writing it into the noVNC URL successfully connects and gives me a desktop.

I’m not sure where to go looking for how this hostname should be determined: inferring from an older thread, I tried adding a set_host key to my cluster definition:

      # This may need some tweaking to get right
      # See:
        script_wrapper: "%s"
        set_host: "host=$(hostname -a)"
        script_wrapper: "singularity exec -B /shared,/lustre,/run /shared/ucl/apps/siflib/centos-8s-quay.sif %s"
        set_host: "host=$(hostname -a)"

And also in this /etc/ood/config/apps/bc_desktop/submit/container.yml.erb setup I adapted from this thread: Mate Desktop in a Singularity container? - #40 by novosirj

  # customised, not sure about this yet
  # the SGE adapter setup for this seems different to others?
  #  nodes: "<%= bc_num_slots %><%= node_type %>"
     - -pe
     - smp
     - <%= num_cores.blank? ? 1 : num_cores.to_i %>
     - -l
     - mem=<%= memory_gigs %>G
     - -l
     - tmpfs=<%= tmpfs_gigs %>G
  template: "vnc"
  websockify_cmd: '/usr/bin/websockify'
  header: |
    #!/bin/bash -l
    source ~/.bashrc
  set_host: "host=$(hostname -a)"
  script_wrapper: |
    set -x
    module purge
    module load singularity-env
    cat << "CTRSCRIPT" >
    export PATH="$PATH:/opt/TurboVNC/bin"

    # customised
    export SINGULARITY_BINDPATH="$HOME,/lustre,/scratch,/shared,/var,/run,/tmp,/tmpdir"

    singularity run <%= image %> /bin/bash

And here’s the corresponding form.yml.erb:

title: "Myriad Desktop"
cluster: "myriad"
submit: submit/container.yml.erb
  desktop: "mate"
  bc_vnc_idle: 0
    required: true
  node_type: null

    widget: "number_field"
    label: "Gigabytes of memory"
    value: 4
    widget: "number_field"
    label: "Gigabytes of temporary on-node storage"
    value: 10

  - bc_vnc_idle
  - desktop
  - bc_account
  - bc_num_hours
  - bc_num_slots
  - num_cores
  - memory_gigs
  - tmpfs_gigs
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started

But setting that set_host field didn’t seem to change anything. As far as I can tell, the original definition of that is just hostname anyway, from: /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/ood_core-0.21.0/lib/ood_core/batch_connect/template.rb ?

Any pointers would be appreciated: I’m guessing this would ideally be set somewhere in the SGE adapter, maybe as an exec_host field to a job object somewhere, but I’m not entirely sure.

Hi Ian.

Thanks for your post.

Some trawling back through from the “Undetermined” string in the i18n files to the session.rb file in the batch_connect classes pointed me in the direction of generated connection.yml files.

Looking at those files for jobs I’ve submitted has shown that they have an empty host field, and if I fix the file manually, the noVNC sessions work from the web interface.

So I worked out how those files are generated… and it turned out:

  • hostname -a returns a blank string on our cluster nodes
  • hostname wasn’t even installed in the Singularity image (thanks, default Centos 8 Stream image)

I’ve changed the set_host fields to host=$(hostname -s) everywhere, and added hostname to the Singularity image, and now getting the host seems to be working and the noVNC sessions start without any manual steps.


