Hi There,
I am trying to deploy OOD 3.0 on a slurm cluster with OS version Centos 8.4.
Everything works fine except the interactive desktop and interactive apps.
TurboVNC and Websockify were installed in a public path and can be started successfully.
I also installed xfce in the public path with command ‘dnf --installroot=/public/software/wesee/xfce groupinstall xfce’, however, interactive desktop cannot start, neither can interactive apps using xfce desktop.
Here’s the job output.log:
Setting VNC password…
Starting VNC server…
Killing Xvnc process ID 1144317
Xvnc process ID 1144317 already killed
Xvnc did not appear to shut down cleanly. Removing /tmp/.X11-unix/X5
Xvnc did not appear to shut down cleanly. Removing /tmp/.X5-lockWARNING: sim3:1 is taken because of /tmp/.X1-lock
Remove this file if there is no X server sim3:1WARNING: sim3:2 is taken because of /tmp/.X2-lock
Remove this file if there is no X server sim3:2WARNING: sim3:3 is taken because of /tmp/.X3-lock
Remove this file if there is no X server sim3:3WARNING: sim3:4 is taken because of /tmp/.X4-lock
Remove this file if there is no X server sim3:4Desktop ‘TurboVNC: sim3:5 (hpctest)’ started on display sim3:5
Log file is vnc.log
Successfully started VNC server on sim3:5900…
Script starting…
Starting websocket server…
ERROR: Collection default cannot be found
Launching desktop ‘xfce’…
Failed to init libxfconf: Error spawning command line “dbus-launch --autolaunch=499c7de025e24973b45c7ee39a1c82b9 --binary-syntax --close-stderr”: Child process exited with code 1.
/public/software/wesee/websockify/usr/lib/python3.6/site-packages/websockify/websocket.py:31: UserWarning: no ‘numpy’ module, HyBi protocol will be slower
warnings.warn(“no ‘numpy’ module, HyBi protocol will be slower”)
WebSocket server settings:
- Listen on :38144
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications…
Generating connection YAML file…
Failed to init libxfconf: Error spawning command line “dbus-launch --autolaunch=499c7de025e24973b45c7ee39a1c82b9 --binary-syntax --close-stderr”: Child process exited with code 1.
Unable to init server: Could not connect: 拒绝连接
xfce4-session: Cannot open display: .
Type ‘xfce4-session --help’ for usage.
Desktop ‘xfce’ ended…
Cleaning up…
/opt/gridview/slurm/spool/slurmd/job00061/slurm_script: 行 25: 1148837 已终止 while read -r line; do
if [[ ${line} =~ “Full-control authentication enabled for” ]]; then
change_passwd; create_yml;
fi;
done < <(tail -f --pid=${SCRIPT_PID} “vnc.log”)
My cluster configuration file content is as follows:
v2:
metadata:
title: "Cluster"
login:
host: "10.10.10.100"
job:
adapter: "slurm"
bin: "/opt/gridview/slurm/usr/bin"
conf: "/opt/gridview/slurm/etc/slurm/slurm.conf"
batch_connect:
basic:
script_wrapper: |
module purge
%s
vnc:
script_wrapper: |
module purge
export PATH="/public/software/wesee/TurboVNC/bin:$PATH"
export PATH="/public/software/wesee/xfce/bin:$PATH"
export WEBSOCKIFY_CMD="/public/software/wesee/websockify/usr/bin/websockify"
%s
Can anyone help me check how this problem might be occurring?
The cluster consists of one control node (i.e. 10.10.10.100) and two computing nodes (i.e. 10.10.10.101 / 10.10.10.102). Did I miss something in the configuration file?
Also, my understanding is that TurboVNC, Websockify and XFCE desktop all need to be installed under a public path, where accessible to all nodes, is that correct?
Is there any way that I can install these services only on the control node, and users can still submit jobs to cluster via control node’s GUI desktop?