I’ve got two ondemand systems, an old one and a new one. The old one is version 1.6.25-1.el7, the new one is 2.0.29-1.el8. I’m trying to get the new one setup and running. I copied over my RStudio app that lives in /var/www/ood/apps/sys/RStudio and adjusted names so that it works.
Looking at the screen shots it, I’m not sure the window manager is running.
Could you share the submit.yml.erb and the output.log from the session logs when you launch? It will help to know what is set and if any errors or warnings are in that output.log.
Setting VNC password...
Starting VNC server...
Desktop 'TurboVNC: compute-0.internal:1 (rbryant)' started on display compute-0.internal:1
Log file is vnc.log
Successfully started VNC server on compute-0:5901...
Script starting...
Starting websocket server...
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules
+ xfwm4 --compositor=off --daemon --sm-client-disable
xfwm4: Unknown option --daemon.
Type "xfwm4 --help" for usage.
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
WebSocket server settings:
- Listen on :6095
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-rbryant'
xfsettingsd: No window manager registered on screen 0.
(xfsettingsd:241709): xfsettingsd-WARNING **: 15:28:59.789: Failed to get the _NET_NUMBER_OF_DESKTOPS property.
xfsettingsd: Another instance took over. Leaving...
+ xfce4-panel --sm-client-disable
(xfce4-panel:242075): xfce4-panel-WARNING **: 15:29:02.624: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory
(xfce4-panel:242075): xfce4-panel-CRITICAL **: 15:29:02.625: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance
Setting VNC password...
Generating connection YAML file...
Setting VNC password...
Generating connection YAML file...
slurmstepd: error: *** JOB 39 ON compute-0 CANCELLED AT 2023-02-21T21:28:56 DUE TO TIME LIMIT ***
I removed the --daemon flag and this is what I got:
Setting VNC password...
Starting VNC server...
Desktop 'TurboVNC: compute-0.internal:1 (rbryant)' started on display compute-0.internal:1
Log file is vnc.log
Successfully started VNC server on compute-0:5901...
Script starting...
Starting websocket server...
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules
+ xfwm4 --compositor=off --sm-client-disable
WebSocket server settings:
- Listen on :5908
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-rbryant'
Setting VNC password...
Generating connection YAML file...
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
xfsettingsd: Could not connect: No such file or directory.
(xfsettingsd:265112): xfsettingsd-ERROR **: 20:42:18.999: Failed to connect to the dbus session bus.
/users/rbryant/ondemand/data/sys/dashboard/batch_connect/sys/RStudio/output/9208e3d4-c73a-4c10-9ca5-c06a5b5594cf/script.sh: line 26: 265112 Trace/breakpoint trap (core dumped) xfsettingsd --sm-client-disable
+ xfce4-panel --sm-client-disable
(xfce4-panel:265123): xfce4-panel-WARNING **: 20:42:19.623: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory
(xfce4-panel:265123): xfce4-panel-CRITICAL **: 20:42:19.623: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance
It doesn’t look like the connection is made to the dbus but the app is running. Can you insert a command in the script.sh.erb to check if dbus is in fact running? Something like ps | grep dbus-daemon may work.
Setting VNC password...
Starting VNC server...
Desktop 'TurboVNC: compute-0.internal:1 (rbryant)' started on display compute-0.internal:1
Log file is vnc.log
Successfully started VNC server on compute-0:5901...
Script starting...
Starting websocket server...
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules
+ whoami
rbryant
+ sh -c 'ps -ef | grep dbus-daemon'
dbus 855 1 0 Feb13 ? 00:00:03 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
rbryant 279311 279290 0 15:27 ? 00:00:00 sh -c ps -ef | grep dbus-daemon
rbryant 279315 279311 0 15:27 ? 00:00:00 grep dbus-daemon
+ xfwm4 --compositor=off --sm-client-disable
WebSocket server settings:
- Listen on :6050
- No SSL/TLS support (no cert file)
- Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-rbryant'
Setting VNC password...
Generating connection YAML file...
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
xfsettingsd: Could not connect: No such file or directory.
(xfsettingsd:279777): xfsettingsd-ERROR **: 15:27:23.629: Failed to connect to the dbus session bus.
/users/rbryant/ondemand/data/sys/dashboard/batch_connect/sys/RStudio/output/423acdec-8ec5-4a29-9973-cfae1ef07d61/script.sh: line 28: 279777 Trace/breakpoint trap (core dumped) xfsettingsd --sm-client-disable
+ xfce4-panel --sm-client-disable
(xfce4-panel:279781): xfce4-panel-WARNING **: 15:27:23.844: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory
(xfce4-panel:279781): xfce4-panel-CRITICAL **: 15:27:23.844: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance
There is nothing in /var/log/messages from today or yesterday when I grep dbus.
This is not an upgrade, but a rebuild… So the existing cluster is CentOS7 and the new cluster is on Rocky8. I use the same Ansible code to deploy new compute nodes in the existing cluster as I do in this rebuild cluster, but I do have switches for things that are OS dependent so that it picks the older version or the newer one.
There was nothing in the ~/.cache/sessions directory to delete.
We’re finding that the XFCE scripts we use do not work well on RHEL/8.
We can easily replicate the same core dumps on our RHEL/8 systems.
(xfsettingsd:279777): xfsettingsd-ERROR **: 15:27:23.629: Failed to connect to the dbus session bus.
/users/rbryant/ondemand/data/sys/dashboard/batch_connect/sys/RStudio/output/423acdec-8ec5-4a29-9973-cfae1ef07d61/script.sh: line 28: 279777 Trace/breakpoint trap (core dumped) xfsettingsd --sm-client-disable
+ xfce4-panel --sm-client-disable
We haven’t migrated any of our applications to our new system so we haven’t started work for the same.
That said, we’ll continue to look into it on our side as we will need to migrate our apps to our new system anyhow, so we may as well do it now while other folks have the same issues.
Yes. I’ll work on RHEL/8 to see what’s what and update this ticket if I have any fixes available.
But yes as of right now, I would say there’s nothing wrong with your system, these scripts just don’t work for RHEL/8 and we need to figure out what will.
Soon, I’ve found with XFCE 14, somethings are daemons and some are not. You see previously we had everything in a backgrounded block () &. I’ve found that you have to background xfwm4 and xfce4-panel. I’ve had to add a few sleep commands to just to be sure everything has time to do it’s business before the next command is issued.
Instead of using the () & block in your script.sh.erb, try this:
export SEND_256_COLORS_TO_REMOTE=1
export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
export XDG_CACHE_HOME="$(mktemp -d)"
module restore
set -x
xfwm4 --sm-client-disable &
sleep 5
xsetroot -solid "#D3D3D3"
xfsettingsd --daemon --sm-client-disable
xfce4-panel --sm-client-disable &
sleep 5
# instead of booting firefox here, boot the program you want to start.
firefox
I’m going to set this particular cluster up as CentOS7 and then migrate later. This is not our main cluster and this particular one would be an easy transition. Our main cluster has a whole order of things for the 8 transition where other pieces of tech debt must come before and others must come after so that’s why I was asking about timing.