Having successfully deployed an OOD 1.5.5 instance in a sandbox on a VM, I am now trying to do the same in our HPC environment. I’m having an issue with the Interactive Desktop, specifically: the environment variable settings I have defined in my cluster configuration in /etc/ood/config/clusters.d are somehow not making their way into the slurm_script that runs the desktop on the compute node, and as a result, the desktop session attempts to find websockify at the default location (/opt/websockify/run), where it is not found, so though I can see all the Mate desktop processes running, in the absence of a socket, I am unable to connect.
Here is the cluster configuration in my HPC environment:
v2:
metadata:
title: "HPC"
login:
host: "login.hpc.myschool.edu"
job:
adapter: "slurm"
cluster: "slurm_cluster"
bin: "/cm/shared/apps/slurm/current/bin"
conf: "/cm/shared/apps/slurm/var/etc/slurm.conf"
batch_connnect:
basic:
script_wrapper: |
module purge
source /etc/environment
%s
vnc:
script_wrapper: |
module purge
source /etc/environment
module load python
export PATH="/opt/TurboVNC/bin:${PATH}"
export WEBSOCKIFY_CMD="/usr/bin/websockify"
%s
Besides the hostname, the only difference in our (working) sandbox is that python (v3) is installed directly from an RPM rather than being loaded as a module. When we start a desktop in the sandbox environment, the job_script_content.sh produced in the user’s output log starts with the lines from the cluster config:
module purge
source /etc/environment
export PATH="/opt/TurboVNC/bin:$PATH"
export WEBSOCKIFY_CMD="/usr/bin/websockify"
In the job_script_content.sh on our new HPC instance, we’re missing those lines altogether, and around 16 lines into the output.log, when it attempts to start websockify, it logs this:
Script starting...
Starting websocket server...
/cm/local/apps/slurm/var/spool/job7940461/slurm_script: line 143: /opt/websockify/run: No such file or directory
In both environments, websockify is located at /usr/bin/websockify, not /opt/websockify/run.
Can anyone tell me why this is happening and what I need to do to fix it?
Thank you,
Richard