Error when connecting to interactive desktop

I have an OOD implementation that works on one cluster/machine, but an identical installation on a different cluster/machine does not work.

Error from output.log that is different between the two:

dbus-update-activation-environment: warning: error sending to systemd: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1

I’ve been chasing this issue for a couple days. A fresh XFCE, websockify, turbovnc, etc have all been installed, reinstalled, copied from the working implementation, nothing thus far has worked.

Surely i’m missing something simple, probably related to dbus or systemd. Anyone got ideas?

Yes it is definitely a systemd/dbus error. Do you see anything relevant in journalctl or similar?

Off the top of my head - I’d check to see if systemd started a dbus-deamon for you.

I just spot checked our systems and I see this in my ps -elf output. Note the user is dbus and the PID is 4 - so systemd (being PID 1) booted it up pretty early.

4 S dbus      2682     1  0  80   0 - 20660 ep_pol Mar31 ?        00:20:40 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Other things I’d ask are if you had to set/reset any XDG related environment variables on the other cluster. For example, OSC sets XDG_RUNTIME_DIR for desktops in their submit.yml.erb.

From assigned node:

# psg dbus
UID          PID    PPID  C STIME TTY          TIME CMD
dbus        1024       1  0 May02 ?        00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus        1027    1024  0 May02 ?        00:00:00 dbus-broker --log 4 --controller 9 --machine-id 70d5e3c2565747cea436607418f94278 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
huston.+   95024       1  0 14:37 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 7 --print-address 9 --session
huston.+   95036   95031  0 14:37 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3

and the info about what is running

# psg huston.rogers
UID          PID    PPID  C STIME TTY          TIME CMD
huston.+   94975       1  0 14:37 ?        00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: atlas-0022:1 (huston.rogers) -auth /home/huston.rogers/.Xauthority -geometry 1240x900 -depth 24 -rfbauth vnc.passwd -x509cert /home/huston.rogers/.vnc/x509_cert.pem -x509key /home/huston.rogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
huston.+   94983   94912  0 14:37 ?        00:00:00 bash /home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/atlas-desktop/output/19154c5f-3034-4d1c-b913-230fe35adacb/script.sh

Nothing notable in journalctl. the dbus-daemon starts, and shows

May 03 14:37:18 atlas-0022 dbus-daemon[95024]: [session uid=1727050709 pid=95022] Activating service name='org.freedesktop.systemd1' requested by ':1.5' (uid=1727050709 pid=95047 comm="dbus-update-activation-environment --systemd SSH_A" >
May 03 14:37:18 atlas-0022 dbus-daemon[95024]: [session uid=1727050709 pid=95022] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1

Which is basically the same as the output.log error.

and in template/desktop/xfce.sh

export XDG_CONFIG_DIRS="/apps/other/ood-depends/xfce-4.18.0/etc:$XDG_CONFIG_DIRS"

and that folder exists:

# ls -lad /apps/other/ood-depends/xfce-4.18.0/etc/
drwxr-xr-x 3 root root 25 May  1 20:03 /apps/other/ood-depends/xfce-4.18.0/etc/

I see dbus-daemon’s launched by you, but I don’t see them launched by dbus. I found our systemd service file here - /usr/lib/systemd/system/dbus.service. I don’t know if you have the similar service enabled. I’d ask to check the working cluster to see if you can find the dbus-daemon launched by systemd on it.

You’re requesting a display on the 5th screen here. This could be a red-herring, but I feel like like a headless compute node likely does not have 5 or more screens. I’m trying to figure out how screens are defined in Xorg, but I suspect this is somehow off, though it may or may not be the actual issue.

xrandr can tell you how many screens you have - though I’m still looking for my Xorg.conf file, I believe they’re defined in that file.

dbus running according to systemctl on the non-working cluster (md formatting made it red but it’s actually fine)

# systemctl status dbus
● dbus-broker.service - D-Bus System Message Bus
     Loaded: loaded (/usr/lib/systemd/system/dbus-broker.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-05-02 14:03:17 CDT; 1 day 1h ago
TriggeredBy: ● dbus.socket
       Docs: man:dbus-broker-launch(1)
   Main PID: 1024 (dbus-broker-lau)
      Tasks: 2 (limit: 2451276)
     Memory: 3.9M
        CPU: 863ms
     CGroup: /system.slice/dbus-broker.service
             ├─1024 /usr/bin/dbus-broker-launch --scope system --audit
             └─1027 dbus-broker --log 4 --controller 9 --machine-id 70d5e3c2565747cea436607418f94278 --max-bytes 53687091>

the dbus service file on non-working

# cat /usr/lib/systemd/system/dbus-broker.service
[Unit]
Description=D-Bus System Message Bus
Documentation=man:dbus-broker-launch(1)
DefaultDependencies=false
Before=basic.target shutdown.target
Requires=dbus.socket
Conflicts=shutdown.target

[Service]
Type=notify
Sockets=dbus.socket
OOMScoreAdjust=-900
LimitNOFILE=16384
ProtectSystem=full
PrivateTmp=true
PrivateDevices=true
ExecStart=/usr/bin/dbus-broker-launch --scope system --audit
ExecReload=/usr/bin/busctl call org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus ReloadConfig

[Install]
Alias=dbus.service

on working:

# psg dbus
dbus        1442       1  0 Apr24 ?        00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus        1443    1442  0 Apr24 ?        00:00:00 dbus-broker --log 4 --controller 9 --machine-id 1f91d5d3705e4bd5b12d901b4e381f97 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
jhrogers  330597       1  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers  330610  330604  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
# psg dbus
UID          PID    PPID  C STIME TTY          TIME CMD
dbus        1442       1  0 Apr24 ?        00:00:00 /usr/bin/dbus-broker-launch --scope system --audit
dbus        1443    1442  0 Apr24 ?        00:00:00 dbus-broker --log 4 --controller 9 --machine-id 1f91d5d3705e4bd5b12d901b4e381f97 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
jhrogers  330597       1  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers  330610  330604  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
root      330922  330813  0 15:14 pts/0    00:00:00 grep -E --color=auto (dbus|UID)
[root@hercules-07-06 ~]# psg jhrogers
UID          PID    PPID  C STIME TTY          TIME CMD
jhrogers  330500  330495  0 15:14 ?        00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers  330548       1  0 15:14 ?        00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: hercules-07-06:1 (jhrogers) -auth /home/jhrogers/.Xauthority -geometry 800x600 -depth 24 -rfbauth vnc.passwd -x509cert /home/jhrogers/.vnc/x509_cert.pem -x509key /home/jhrogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
jhrogers  330556  330500  0 15:14 ?        00:00:00 bash /home/jhrogers/ondemand/data/sys/dashboard/batch_connect/sys/hercules-desktop/output/bfaf560f-63f9-4bb9-8a17-93132b1b99a4/script.sh
jhrogers  330582       1  0 15:14 ?        00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 11059 localhost:5901
jhrogers  330597       1  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 6 --print-address 8 --session
jhrogers  330598  330556  0 15:14 ?        00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/xfconf/xfconfd
jhrogers  330599  330556  0 15:14 ?        00:00:00 xfce4-session
jhrogers  330604       1  0 15:14 ?        00:00:00 /usr/libexec/at-spi-bus-launcher
jhrogers  330610  330604  0 15:14 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
jhrogers  330615       1  0 15:14 ?        00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
jhrogers  330619  330599  0 15:14 ?        00:00:00 xfwm4
jhrogers  330626  330599  0 15:14 ?        00:00:00 xfsettingsd
jhrogers  330629  330599  0 15:14 ?        00:00:00 xfce4-panel
jhrogers  330635  330599  0 15:14 ?        00:00:00 Thunar --daemon
jhrogers  330640  330599  0 15:14 ?        00:00:00 xfdesktop
jhrogers  330655  330500  0 15:14 ?        00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers  330658  330655  0 15:14 ?        00:00:00 /bin/bash /var/spool/slurmd/job1144930/slurm_script
jhrogers  330659  330658  0 15:14 ?        00:00:00 tail -f --pid=330556 vnc.log
jhrogers  330662  330629  0 15:14 ?        00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libsystray.so 6 10485772 systray Status Tray Plugin Provides status notifier items (application indicators) and legacy systray items
jhrogers  330669  330629  0 15:14 ?        00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libactions.so 14 10485773 actions Action Buttons Log out, lock or other system actions
jhrogers  330767  330582  0 15:14 ?        00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 11059 localhost:5901

It’s almost like websockify isn’t quite firing up. But if i peek the websockify log on both

Working

WebSocket server settings:
  - Listen on :11059
  - No SSL/TLS support (no cert file)
  - proxying from :11059 to localhost:5901
130.18.14.159 - - [03/May/2024 15:14:44] 130.18.14.159: Plain non-SSL (ws://) WebSocket connection
130.18.14.159 - - [03/May/2024 15:14:44] 130.18.14.159: Path: '/websockify'
130.18.14.159 - - [03/May/2024 15:14:44] connecting to: localhost:5901

non-working

WebSocket server settings:
  - Listen on :8163
  - No SSL/TLS support (no cert file)
  - proxying from :8163 to localhost:5901

So is the issue related to websockify not forwarding correctly?

this is also supported by a duplicate jupyter application, that works on the working cluster, but throws “URL not found” on the non-working cluster.

I also think the display number is a red herring, but i could be convinced otherwise.

ignore that psg compare. If you put a dot in your grep and don’t think about it too much, it gets the wrong list of what’s running.

Here’s what’s running on the non-working cluster, which matches the working cluster:

# psg huston
UID          PID    PPID  C STIME TTY          TIME CMD
huston.+   94912   94908  0 14:37 ?        00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+   94975       1  0 14:37 ?        00:00:00 /apps/other/ood-depends/turbovnc-3.1.1/bin/Xvnc :1 -desktop TurboVNC: atlas-0022:1 (huston.rogers) -auth /home/huston.rogers/.Xauthority -geometry 1240x900 -depth 24 -rfbauth vnc.passwd -x509cert /home/huston.rogers/.vnc/x509_cert.pem -x509key /home/huston.rogers/.vnc/x509_private.pem -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -deferupdate 1 -dridir /usr/lib64/dri -idletimeout 0
huston.+   94983   94912  0 14:37 ?        00:00:00 bash /home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/atlas-desktop/output/19154c5f-3034-4d1c-b913-230fe35adacb/script.sh
huston.+   95009       1  0 14:37 ?        00:00:00 /usr/bin/python3 /apps/other/ood-depends/websockify-0.11.0/bin/websockify 8163 --heartbeat=30 localhost:5901
huston.+   95024       1  0 14:37 ?        00:00:00 /usr/bin/dbus-daemon --syslog --fork --print-pid 7 --print-address 9 --session
huston.+   95026   94983  0 14:37 ?        00:00:00 xfce4-session
huston.+   95031       1  0 14:37 ?        00:00:00 /usr/libexec/at-spi-bus-launcher
huston.+   95036   95031  0 14:37 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
huston.+   95041       1  0 14:37 ?        00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
huston.+   95046       1  0 14:37 ?        00:00:00 /usr/bin/ssh-agent -s
huston.+   95054   95026  0 14:37 ?        00:00:00 xfwm4
huston.+   95061   95026  0 14:37 ?        00:00:00 xfsettingsd
huston.+   95064   95026  0 14:37 ?        00:00:01 xfce4-panel
huston.+   95071   95026  0 14:37 ?        00:00:00 Thunar --daemon
huston.+   95076   95026  0 14:37 ?        00:00:00 xfdesktop
huston.+   95091   95064  0 14:37 ?        00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libsystray.so 6 10485772 systray Status Tray Plugin Provides status notifier items (application indicators) and legacy systray items
huston.+   95097   94912  0 14:37 ?        00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+   95101   95097  0 14:37 ?        00:00:00 /bin/bash /var/spool/slurmd/job14824612/slurm_script
huston.+   95102   95101  0 14:37 ?        00:00:00 tail -f --pid=94983 vnc.log
huston.+   95105   95064  0 14:37 ?        00:00:00 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/wrapper-2.0 /apps/other/ood-depends/xfce-4.18.0/lib/xfce4/panel/plugins/libactions.so 14 10485773 actions Action Buttons Log out, lock or other system actions

What is your OS and version? I’m surprised you don’t have a dbus.service - what you’re showing is dbus-broker which could be different? Though it doesn’t appear to be on the system that’s working, so maybe that’s a red herring too…

IDK about your Jupyter issues - but Juypter is web application so there’s no X11/VNC/dbus involved with that application.

OS = Rocky-9.1

Checking reverse proxy stuff, based on websockify not connecting, and jupyter not connecting:

nc -l 5432

This address generates stuff

https://atlas-ood.hpc.msstate.edu/node/atlas-0022.hpc.msstate.edu/5432

This one doesn’t

https://atlas-ood.hpc.msstate.edu/node/atlas-0022/5432

This was the right place to look. my OOD Portal file did not match the cluster file. The portal file had the fqdn in the regex, but the hosts were using their short names:

host=$(hostname | awk -F. '{print $1}' | tr A-Z a-z )

:person_facepalming:t3:

That fixed desktop and jupyter, but not rstudio

xfsettingsd: No window manager registered on screen 0.

(xfsettingsd:105274): xfsettingsd-WARNING **: 15:48:12.474: Failed to get the _NET_NUMBER_OF_DESKTOPS property.

Any parallel thoughts with the displays herring?

rstudio is also a web application (which doesn’t require X11/VNC/dbus) - unless you’re running the other variant which I can’t remember of the top of my head.

We are running the other version, I believe:

#
# Launch Xfce Window Manager and Panel
#

(
  export BASE="/apps/other/ood-depends/"
  export PATH="$BASE/xfce-4.18.0/bin:$BASE/xfce-4.18.0/sbin:$BASE/turbovnc-3.1.1/bin:$BASE/nmap-7.94/bin:$BASE/nmap-7.94/contrib/bin:$PATH"
  export LD_LIBRARY_PATH="$BASE/xfce-4.18.0/lib:$BASE/xfce-4.18.0/contrib/lib64:$BASE/turbovnc-3.1.1/contrib/lib64:$BASE/nmap-7.94:$BASE/nmap-7.94/contrib/lib:$LD_LIBRARY_PATH"
  unset BASE
  export SEND_256_COLORS_TO_REMOTE=1
  export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
  export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
  export XDG_CACHE_HOME="$(mktemp -d)"
  export $(dbus-launch)

  module restore
  set -x
  xfwm4 --compositor=off --daemon --sm-client-disable
  xsetroot -solid "#D3D3D3"
  xfsettingsd --sm-client-disable
  xfce4-panel --sm-client-disable
) &

#

# Load the required environment
module load rstudio/2024.04.0

# Launch
module list
set -x
rstudio

No, but googling that seems to indicate that XFCE is responding to files in your ~/.cache/sessions so maybe it’s worth a shot to remove/backup things in your home directories.

removed, cache, still catching the same error:

xfsettingsd: No window manager registered on screen 0.

(xfsettingsd:105924): xfsettingsd-WARNING **: 16:00:18.534: Failed to get the _NET_NUMBER_OF_DESKTOPS property.
[105929:0503/160021.656776:FATAL:bus.cc(1246)] D-Bus connection was disconnected. Aborting.
/home/huston.rogers/ondemand/data/sys/dashboard/batch_connect/sys/rstudio-2024-04-0/output/082b114b-7c83-483f-b271-598ab3253883/script.sh: line 41: 105929 Trace/breakpoint trap   (core dumped) rstudio
Cleaning up...
Killing Xvnc process ID 105882
+ xfce4-panel --sm-client-disable
xfce4-panel: Cannot open display: .
Type "xfce4-panel --help" for usage

Almost as if the display isn’t getting assigned.