I have an OOD implementation that works on one cluster/machine, but an identical installation on a different cluster/machine does not work.
Error from output.log that is different between the two:
dbus-update-activation-environment: warning: error sending to systemd: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
I’ve been chasing this issue for a couple days. A fresh XFCE, websockify, turbovnc, etc have all been installed, reinstalled, copied from the working implementation, nothing thus far has worked.
Surely i’m missing something simple, probably related to dbus or systemd. Anyone got ideas?
Yes it is definitely a systemd/dbus error. Do you see anything relevant in journalctl
or similar?
Off the top of my head - I’d check to see if systemd started a dbus-deamon for you.
I just spot checked our systems and I see this in my ps -elf
output. Note the user is dbus
and the PID is 4 - so systemd (being PID 1) booted it up pretty early.
4 S dbus 2682 1 0 80 0 - 20660 ep_pol Mar31 ? 00:20:40 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
Other things I’d ask are if you had to set/reset any XDG
related environment variables on the other cluster. For example, OSC sets XDG_RUNTIME_DIR
for desktops in their submit.yml.erb
.
From assigned node:
# psg dbus
UID PID PPID C STIME TTY TIME CMD
dbus 1024 1 0 May02 ? 00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus 1027 1024 0 May02 ? 00:00:00 dbus-broker --log 4 --controller 9 --machine-id 70d5e3c2565747cea436607418f94278 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
huston.+ 95024 1 0 14:37 ? 00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 7 --print-address 9 --session
huston.+ 95036 95031 0 14:37 ? 00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
and the info about what is running
# psg huston.rogers
UID PID PPID C STIME TTY TIME CMD
huston.+ 94975 1 0 14:37 ? 00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: atlas-0022:1 (huston.rogers) -auth /home/huston.rogers/.Xauthority -geometry 1240x900 -depth 24 -rfbauth vnc.passwd -x509cert /home/huston.rogers/.vnc/x509_cert.pem -x509key /home/huston.rogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
huston.+ 94983 94912 0 14:37 ? 00:00:00 bash /home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/atlas-desktop/output/19154c5f-3034-4d1c-b913-230fe35adacb/script.sh
Nothing notable in journalctl. the dbus-daemon starts, and shows
May 03 14:37:18 atlas-0022 dbus-daemon[95024]: [session uid=1727050709 pid=95022] Activating service name='org.freedesktop.systemd1' requested by ':1.5' (uid=1727050709 pid=95047 comm="dbus-update-activation-environment --systemd SSH_A" >
May 03 14:37:18 atlas-0022 dbus-daemon[95024]: [session uid=1727050709 pid=95022] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Which is basically the same as the output.log error.
and in template/desktop/xfce.sh
export XDG_CONFIG_DIRS="/apps/other/ood-depends/xfce-4.18.0/etc:$XDG_CONFIG_DIRS"
and that folder exists:
# ls -lad /apps/other/ood-depends/xfce-4.18.0/etc/
drwxr-xr-x 3 root root 25 May 1 20:03 /apps/other/ood-depends/xfce-4.18.0/etc/
I see dbus-daemon
’s launched by you, but I don’t see them launched by dbus
. I found our systemd service file here - /usr/lib/systemd/system/dbus.service
. I don’t know if you have the similar service enabled. I’d ask to check the working cluster to see if you can find the dbus-daemon
launched by systemd on it.
snowbird294:
requested by ':1.5'
You’re requesting a display on the 5th screen here. This could be a red-herring, but I feel like like a headless compute node likely does not have 5 or more screens. I’m trying to figure out how screens are defined in Xorg, but I suspect this is somehow off, though it may or may not be the actual issue.
xrandr
can tell you how many screens you have - though I’m still looking for my Xorg.conf
file, I believe they’re defined in that file.
dbus running according to systemctl on the non-working cluster (md formatting made it red but it’s actually fine)
# systemctl status dbus
● dbus-broker.service - D-Bus System Message Bus
Loaded: loaded (/usr/lib/systemd/system/dbus-broker.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-05-02 14:03:17 CDT; 1 day 1h ago
TriggeredBy: ● dbus.socket
Docs: man:dbus-broker-launch(1)
Main PID: 1024 (dbus-broker-lau)
Tasks: 2 (limit: 2451276)
Memory: 3.9M
CPU: 863ms
CGroup: /system.slice/dbus-broker.service
├─1024 /usr/bin/dbus-broker-launch --scope system --audit
└─1027 dbus-broker --log 4 --controller 9 --machine-id 70d5e3c2565747cea436607418f94278 --max-bytes 53687091>
the dbus service file on non-working
# cat /usr/lib/systemd/system/dbus-broker.service
[Unit]
Description=D-Bus System Message Bus
Documentation=man:dbus-broker-launch(1)
DefaultDependencies=false
Before=basic.target shutdown.target
Requires=dbus.socket
Conflicts=shutdown.target
[Service]
Type=notify
Sockets=dbus.socket
OOMScoreAdjust=-900
LimitNOFILE=16384
ProtectSystem=full
PrivateTmp=true
PrivateDevices=true
ExecStart=/usr/bin/dbus-broker-launch --scope system --audit
ExecReload=/usr/bin/busctl call org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus ReloadConfig
[Install]
Alias=dbus.service
on working:
# psg dbus
dbus 1442 1 0 Apr24 ? 00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus 1443 1442 0 Apr24 ? 00:00:00 dbus-broker --log 4 --controller 9 --machine-id 1f91d5d3705e4bd5b12d901b4e381f97 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
jhrogers 330597 1 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers 330610 330604 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
# psg dbus
UID PID PPID C STIME TTY TIME CMD
dbus 1442 1 0 Apr24 ? 00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus 1443 1442 0 Apr24 ? 00:00:00 dbus-broker --log 4 --controller 9 --machine-id 1f91d5d3705e4bd5b12d901b4e381f97 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
jhrogers 330597 1 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers 330610 330604 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
root 330922 330813 0 15:14 pts/0 00:00:00 grep -E --color=auto (dbus|UID)
[root@hercules-07-06 ~]# psg jhrogers
UID PID PPID C STIME TTY TIME CMD
jhrogers 330500 330495 0 15:14 ? 00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers 330548 1 0 15:14 ? 00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: hercules-07-06:1 (jhrogers) -auth /home/jhrogers/.Xauthority -geometry 800x600 -depth 24 -rfbauth vnc.passwd -x509cert /home/jhrogers/.vnc/x509_cert.pem -x509key /home/jhrogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
jhrogers 330556 330500 0 15:14 ? 00:00:00 bash /home/jhrogers/ondemand/data/sys/dashboard/batch_connect/sys/hercules-desktop/output/bfaf560f-63f9-4bb9-8a17-93132b1b99a4/script.sh
jhrogers 330582 1 0 15:14 ? 00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 11059 localhost:5901
jhrogers 330597 1 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers 330598 330556 0 15:14 ? 00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/xfconf/xfconfd
jhrogers 330599 330556 0 15:14 ? 00:00:00 xfce4-session
jhrogers 330604 1 0 15:14 ? 00:00:00 /usr/libexec/at-spi-bus-launcher
jhrogers 330610 330604 0 15:14 ? 00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
jhrogers 330615 1 0 15:14 ? 00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
jhrogers 330619 330599 0 15:14 ? 00:00:00 xfwm4
jhrogers 330626 330599 0 15:14 ? 00:00:00 xfsettingsd
jhrogers 330629 330599 0 15:14 ? 00:00:00 xfce4-panel
jhrogers 330635 330599 0 15:14 ? 00:00:00 Thunar --daemon
jhrogers 330640 330599 0 15:14 ? 00:00:00 xfdesktop
jhrogers 330655 330500 0 15:14 ? 00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers 330658 330655 0 15:14 ? 00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers 330659 330658 0 15:14 ? 00:00:00 tail -f --pid=330556 vnc.log
jhrogers 330662 330629 0 15:14 ? 00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libsystray.so 6 10485772 systray Status Tray Plugin Provides status notifier items (application indicators) and legacy systray items
jhrogers 330669 330629 0 15:14 ? 00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libactions.so 14 10485773 actions Action Buttons Log out, lock or other system actions
jhrogers 330767 330582 0 15:14 ? 00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 11059 localhost:5901
It’s almost like websockify isn’t quite firing up. But if i peek the websockify log on both
Working
WebSocket server settings:
- Listen on :11059
- No SSL/TLS support (no cert file)
- proxying from :11059 to localhost:5901
130.18.14.159 - - [03/May/2024 15:14:44] 130.18.14.159: Plain non-SSL (ws://) WebSocket connection
130.18.14.159 - - [03/May/2024 15:14:44] 130.18.14.159: Path: '/websockify'
130.18.14.159 - - [03/May/2024 15:14:44] connecting to: localhost:5901
non-working
WebSocket server settings:
- Listen on :8163
- No SSL/TLS support (no cert file)
- proxying from :8163 to localhost:5901
So is the issue related to websockify not forwarding correctly?
this is also supported by a duplicate jupyter application, that works on the working cluster, but throws “URL not found” on the non-working cluster.
I also think the display number is a red herring, but i could be convinced otherwise.
ignore that psg compare. If you put a dot in your grep and don’t think about it too much, it gets the wrong list of what’s running.
Here’s what’s running on the non-working cluster, which matches the working cluster:
# psg huston
UID PID PPID C STIME TTY TIME CMD
huston.+ 94912 94908 0 14:37 ? 00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+ 94975 1 0 14:37 ? 00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: atlas-0022:1 (huston.rogers) -auth /home/huston.rogers/.Xauthority -geometry 1240x900 -depth 24 -rfbauth vnc.passwd -x509cert /home/huston.rogers/.vnc/x509_cert.pem -x509key /home/huston.rogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
huston.+ 94983 94912 0 14:37 ? 00:00:00 bash /home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/atlas-desktop/output/19154c5f-3034-4d1c-b913-230fe35adacb/script.sh
huston.+ 95009 1 0 14:37 ? 00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 8163 --heartbeat=30 localhost:5901
huston.+ 95024 1 0 14:37 ? 00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 7 --print-address 9 --session
huston.+ 95026 94983 0 14:37 ? 00:00:00 xfce4-session
huston.+ 95031 1 0 14:37 ? 00:00:00 /usr/libexec/at-spi-bus-launcher
huston.+ 95036 95031 0 14:37 ? 00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
huston.+ 95041 1 0 14:37 ? 00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
huston.+ 95046 1 0 14:37 ? 00:00:00 /usr/bin/ssh-agent -s
huston.+ 95054 95026 0 14:37 ? 00:00:00 xfwm4
huston.+ 95061 95026 0 14:37 ? 00:00:00 xfsettingsd
huston.+ 95064 95026 0 14:37 ? 00:00:01 xfce4-panel
huston.+ 95071 95026 0 14:37 ? 00:00:00 Thunar --daemon
huston.+ 95076 95026 0 14:37 ? 00:00:00 xfdesktop
huston.+ 95091 95064 0 14:37 ? 00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libsystray.so 6 10485772 systray Status Tray Plugin Provides status notifier items (application indicators) and legacy systray items
huston.+ 95097 94912 0 14:37 ? 00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+ 95101 95097 0 14:37 ? 00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+ 95102 95101 0 14:37 ? 00:00:00 tail -f --pid=94983 vnc.log
huston.+ 95105 95064 0 14:37 ? 00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libactions.so 14 10485773 actions Action Buttons Log out, lock or other system actions
What is your OS and version? I’m surprised you don’t have a dbus.service
- what you’re showing is dbus-broker
which could be different? Though it doesn’t appear to be on the system that’s working, so maybe that’s a red herring too…
IDK about your Jupyter issues - but Juypter is web application so there’s no X11/VNC/dbus involved with that application.
OS = Rocky-9.1
Checking reverse proxy stuff, based on websockify not connecting, and jupyter not connecting:
nc -l 5432
This address generates stuff
https://atlas-ood.hpc.msstate.edu/node/atlas-0022.hpc.msstate.edu/5432
This one doesn’t
https://atlas-ood.hpc.msstate.edu/node/atlas-0022/5432
snowbird294:
So is the issue related to websockify not forwarding correctly?
this is also supported by a duplicate jupyter application, that works on the working cluster, but throws “URL not found” on the non-working cluster.
This was the right place to look. my OOD Portal file did not match the cluster file. The portal file had the fqdn in the regex, but the hosts were using their short names:
host=$(hostname | awk -F. '{print $1}' | tr A-Z a-z )
That fixed desktop and jupyter, but not rstudio
xfsettingsd: No window manager registered on screen 0.
(xfsettingsd:105274): xfsettingsd-WARNING **: 15:48:12.474: Failed to get the _NET_NUMBER_OF_DESKTOPS property.
Any parallel thoughts with the displays herring?
rstudio is also a web application (which doesn’t require X11/VNC/dbus) - unless you’re running the other variant which I can’t remember of the top of my head.
We are running the other version, I believe:
#
# Launch Xfce Window Manager and Panel
#
(
export BASE="/apps/other/ood-depends/"
export PATH="$BASE/xfce-4.18.0/bin:$BASE/xfce-4.18.0/sbin:$BASE/turbovnc-3.1.1/bin:$BASE/nmap-7.94/bin:$BASE/nmap-7.94/contrib/bin:$PATH"
export LD_LIBRARY_PATH="$BASE/xfce-4.18.0/lib:$BASE/xfce-4.18.0/contrib/lib64:$BASE/turbovnc-3.1.1/contrib/lib64:$BASE/nmap-7.94:$BASE/nmap-7.94/contrib/lib:$LD_LIBRARY_PATH"
unset BASE
export SEND_256_COLORS_TO_REMOTE=1
export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
export XDG_CACHE_HOME="$(mktemp -d)"
export $(dbus-launch)
module restore
set -x
xfwm4 --compositor=off --daemon --sm-client-disable
xsetroot -solid "#D3D3D3"
xfsettingsd --sm-client-disable
xfce4-panel --sm-client-disable
) &
#
# Load the required environment
module load rstudio/2024.04.0
# Launch
module list
set -x
rstudio
No, but googling that seems to indicate that XFCE is responding to files in your ~/.cache/sessions
so maybe it’s worth a shot to remove/backup things in your home directories.
removed, cache, still catching the same error:
xfsettingsd: No window manager registered on screen 0.
(xfsettingsd:105924): xfsettingsd-WARNING **: 16:00:18.534: Failed to get the _NET_NUMBER_OF_DESKTOPS property.
[105929:0503/160021.656776:FATAL:bus.cc(1246)] D-Bus connection was disconnected. Aborting.
/home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/rstudio-2024-04-0/output/082b114b-7c83-483f-b271-598ab3253883/script.sh: line 41: 105929 Trace/breakpoint trap (core dumped) rstudio
Cleaning up...
Killing Xvnc process ID 105882
+ xfce4-panel --sm-client-disable
xfce4-panel: Cannot open display: .
Type "xfce4-panel --help" for usage
Almost as if the display isn’t getting assigned.
system
(system)
Closed
October 30, 2024, 9:01pm
15
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.