Problems with interactive Matlab app window

Hello, I’m having trouble getting the expected behavior with the window in a Matlab interactive app. Any help or ideas would be greatly appreciated!

My problem appears similar to the issue described in Matlab window problem: the standard window features don’t work, i.e. the window won’t resize, min/max/close buttons are missing. However, I’m seeing different errors in the log.

I’m using a simplified version of OSC’s script.sh.erb from the OSC/bc_osc_matlab GitHub repo:

#!/usr/bin/env bash

# Clean the environment
module purge

# Set working directory to home directory
cd "${HOME}"

# Launch Xfce Window Manager and Panel

(
  export SEND_256_COLORS_TO_REMOTE=1
  export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
  export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
  export XDG_CACHE_HOME="$(mktemp -d)"
  set -x
  xfwm4 --compositor=off --sm-client-disable
  xsetroot -solid "#D3D3D3"
  xfsettingsd --sm-client-disable
  xfce4-panel --sm-client-disable
) &

# Start MATLAB

# Load the required environment
module load matlab

# Launch MATLAB
module list
set -x
matlab -desktop

Here’s output.log:

Setting VNC password...
Starting VNC server...

Desktop 'TurboVNC: <hostname>:1 (<username>)' started on display <hostname>:1

Log file is vnc.log
Successfully started VNC server on <hostname>:5901...
Script starting...
Starting websocket server...
+ xfwm4 --compositor=off --sm-client-disable
Currently Loaded Modulefiles:
 1) matlab/R2020a  
+ matlab -desktop
WebSocket server settings:
  - Listen on :9882
  - No SSL/TLS support (no cert file)
  - Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
xfsettingsd: Could not connect: No such file or directory.

(xfsettingsd:718476): xfsettingsd-ERROR **: 10:43:40.190: Failed to connect to the dbus session bus.
/home/<username>/ondemand/data/sys/dashboard/batch_connect/sys/matlab/output/b2b33142-b9b3-409f-85e1-5a0aeb6b8dae/script.sh: line 22: 718476 Trace/breakpoint trap   (core dumped) xfsettingsd --sm-client-disable
+ xfce4-panel --sm-client-disable

(xfce4-panel:718489): xfce4-panel-WARNING **: 10:43:40.376: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory

(xfce4-panel:718489): xfce4-panel-CRITICAL **: 10:43:40.377: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance

MATLAB is selecting SOFTWARE OPENGL rendering.
Failed to create secure directory (/run/user/<uid>/pulse): No such file or directory
Setting VNC password...
Generating connection YAML file...

Can I get a dummy check from someone who has a working setup? Did I miss something?

Here’s my setup:
OS: latest RHEL 8.7, with regular updates
Slurm 20.11
Xfce 4.16 installed from EPEL 8 repo
OOD 2.0.31

Thanks for any assistance you can provide!

Hi Daniel.

Thanks for your post. Just wanted to let you know that I am looking into this.

Thanks,
-gerald

Hi Daniel.

Can you please start your MatLab job and send me the View Only/Shareable link?

I want to see if you & I are seeing the same thing.

Thanks,
-gerald

Thanks so much Gerald!

Unfortunately, the link won’t work, I’m on a restricted network. But here’s a screenshot of what I see (blank this time, but other times I can see everything but no window control buttons):

Also, another data point: through testing, I’ve found that if I have a separate SSH session on the compute node going at the same time, launching an interactive Matlab session works as expected, the window is responsive and there aren’t any errors in the log.

Here’s a screenshot when that separate SSH session is active

Here’s that output.log:

Setting VNC password...
Starting VNC server...

Desktop 'TurboVNC: <hostname>:1 (<username>)' started on display <hostname>:1

Log file is vnc.log
Successfully started VNC server on <hostname>:5901...
Script starting...
Starting websocket server...
+ xfwm4 --compositor=off --sm-client-disable
Currently Loaded Modulefiles:
 1) matlab/R2020a  
+ matlab -desktop
WebSocket server settings:
  - Listen on :9183
  - No SSL/TLS support (no cert file)
  - Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
MATLAB is selecting SOFTWARE OPENGL rendering.
Setting VNC password...
Generating connection YAML file...

Hi Daniel.

Here’s some screen captures from mine using the script that you provided.

The original startup.

I click the “RESTORE” control button (
image
) on the Matlab window and get the following.

You will notice all control buttons are missing. However, they aren’t missing, they are actually off screen to the right. If I drag the window to the left, then the control buttons are now visible.

I would like to propose that you join our Open office Hours tomorrow at 11:15 Eastern. There will be more folks on the call, and we can also have you share your screen so we can help trouble-shoot. The meeting information is:

Topic: Open OnDemand Open Office Hours

Time: This is a recurring meeting Meet anytime

Join Zoom Meeting

Meeting ID: 962 9856 8321

Password: 424991

One tap mobile

+16513728299,96298568321#,0#,424991# US (Minnesota)

+13017158592,96298568321#,0#,424991# US (Washington DC)

Dial by your location

+1 651 372 8299 US (Minnesota)

+1 301 715 8592 US (Washington DC)

+1 312 626 6799 US (Chicago)

+1 646 876 9923 US (New York)

+1 669 900 6833 US (San Jose)

+1 253 215 8782 US (Tacoma)

+1 346 248 7799 US (Houston)

+1 408 638 0968 US (San Jose)

Meeting ID: 962 9856 8321

Password: 424991

Find your local number: Zoom International Dial-in Numbers - Zoom

Join by SIP

96298568321@zoomcrc.com

Join by H.323

162.255.37.11 (US West)

162.255.36.11 (US East)

115.114.131.7 (India Mumbai)

115.114.115.7 (India Hyderabad)

213.19.144.110 (Amsterdam Netherlands)

213.244.140.110 (Germany)

103.122.166.55 (Australia Sydney)

103.122.167.55 (Australia Melbourne)

64.211.144.160 (Brazil)

69.174.57.160 (Canada Toronto)

65.39.152.160 (Canada Vancouver)

207.226.132.110 (Japan Tokyo)

149.137.24.110 (Japan Osaka)

Meeting ID: 962 9856 8321

Password: 424991


The Ohio State University


Please direct question about this meeting to the meeting organizer.

CarmenZoom is a service provided by the Office of Technology and Digital Innovation (IT.osu.edu).

go.osu.edu/SystemStatus

614-688-4357 (HELP)

CarmenZoom@osu.edu

If you have a disability and have trouble accessing this content, please call the Accessibility Help Line 614-292-5000.

Privacy: go.osu.edu/privacy

Digital Accessibility: accessibility.osu.edu

Nondiscrimination Notice (PDF): go.osu.edu/NonDiscrimination-Notice

While you can totally come to office hours - I’d try setting your XDG_RUNTIME_DIR to something within the $TMDIR. We’ve had similar issues with Slurm and XDG settings.

These 2 messages are very important - you’re core dumping and you can’t use /run/user/$UID during the job.

You can set export XDG_RUNTIME_DIR for the entire cluster in the same place you may have set script_wrapper in your cluster.d file.

If you haven’t set a script_wrapper you can see this example here for the same.

https://osc.github.io/ood-documentation/latest/installation/cluster-config-schema.html?highlight=module%20restore

Here’s the cluster file:

---
    v2:
      metadata:
        title: "<title>"
      login:
        host: "<FQDN>"
      job:
        adapter: "slurm"
        bin: "/usr/bin"
        conf: "/etc/slurm/slurm.conf"
      batch_connect:
        basic:
          script_wrapper: |
            module purge
            %s
        vnc:
          script_wrapper: |
            module purge
            export PATH="/opt/TurboVNC/bin/:$PATH"
            export WEBSOCKIFY_CMD="/usr/bin/websockify"
            %s
          min_port: 9000
          max_port: 9999

Here’s where I tried redefining the runtime directory variable in script.sh.erb (right at the top):

#!/usr/bin/env bash
unset XDG_RUNTIME_DIR
export XDG_RUNTIME_DIR="/tmp/user/$(id -u)"

# Clean the environment
module purge

# Set working directory to home directory
cd "${HOME}"

# Launch Xfce Window Manager and Panel
#

(
  export SEND_256_COLORS_TO_REMOTE=1
  export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
  export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
  export XDG_CACHE_HOME="$(mktemp -d)"
  set -x
  xfwm4 --compositor=off --sm-client-disable
  xsetroot -solid "#D3D3D3"
  xfsettingsd --sm-client-disable
  xfce4-panel --sm-client-disable
) &



#
# Start MATLAB
#

# Load the required environment
module load matlab

# Launch MATLAB
module list
set -x
matlab -desktop

Here try setting it in the before_script so that it’ll try to set it before vncserver boots up.

---
    v2:
      metadata:
        title: "<title>"
      login:
        host: "<FQDN>"
      job:
        adapter: "slurm"
        bin: "/usr/bin"
        conf: "/etc/slurm/slurm.conf"
      batch_connect:
        basic:
          script_wrapper: |
            module purge
            %s
        vnc:
          before_script: |
            # Export the module function if it exists
            [[ $(type -t module) == "function"  ]] && export -f module

            # Slurm doesn't like /var/run/$(id -u)
            export XDG_RUNTIME_DIR="$TMPDIR/xdg_runtime"
          script_wrapper: |
            module purge
            export PATH="/opt/TurboVNC/bin/:$PATH"
            export WEBSOCKIFY_CMD="/usr/bin/websockify"
            %s
          min_port: 9000
          max_port: 9999

Unfortunately, it’s still not working, though an echo in script.sh.erb shows the variable is being set as we want:

... <truncated> ...
# Launch MATLAB
module list
set -x
echo $XDG_RUNTIME_DIR
matlab -desktop

Here’s output.log:

Setting VNC password...
Starting VNC server...

Desktop 'TurboVNC: <hostname>:1 (<username>)' started on display <hostname>:1

Log file is vnc.log
Successfully started VNC server on <hostname>:5901...
Script starting...
Starting websocket server...
+ xfwm4 --compositor=off --sm-client-disable
Currently Loaded Modulefiles:
 1) matlab/R2020a  
+ echo /tmp/xdg_runtime
/tmp/xdg_runtime
+ matlab -desktop
WebSocket server settings:
  - Listen on :9734
  - No SSL/TLS support (no cert file)
  - Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...
MATLAB is selecting SOFTWARE OPENGL rendering.
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
xfsettingsd: Could not connect: No such file or directory.

(xfsettingsd:907982): xfsettingsd-ERROR **: 15:24:16.079: Failed to connect to the dbus session bus.
/home/<username>/ondemand/data/sys/dashboard/batch_connect/sys/matlab/output/68ad510a-ee1e-476d-a8ce-41e1638c3424/script.sh: line 22: 907982 Trace/breakpoint trap   (core dumped) xfsettingsd --sm-client-disable
+ xfce4-panel --sm-client-disable

(xfce4-panel:907999): xfce4-panel-WARNING **: 15:24:16.382: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory

(xfce4-panel:907999): xfce4-panel-CRITICAL **: 15:24:16.383: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance

Setting VNC password...
Generating connection YAML file...

I think the issue here may be that the variable isn’t set “early” enough. Though I’m not quite sure where to set it. I tried ~/.pam_environment and /etc/environment based on the pam_env man page, but those didn’t affect any change with the slurm submissions, I only noticed the updated variable if I directly logged in with SSH.

OK, maybe try to export it in the script_wrapper. When you check job_script_content (the script we submit to slurm) this will be one of the first things it does.

---
    v2:
      metadata:
        title: "<title>"
      login:
        host: "<FQDN>"
      job:
        adapter: "slurm"
        bin: "/usr/bin"
        conf: "/etc/slurm/slurm.conf"
      batch_connect:
        basic:
          script_wrapper: |
            module purge
            %s
        vnc:
          script_wrapper: |
            module purge
            export PATH="/opt/TurboVNC/bin/:$PATH"
            export WEBSOCKIFY_CMD="/usr/bin/websockify"

            # Slurm doesn't like /var/run/$(id -u)
            export XDG_RUNTIME_DIR="$TMPDIR/xdg_runtime"
            %s
          min_port: 9000
          max_port: 9999

Thanks for the help Jeff. It’s still not working, and I think the journal shows why - the login which triggers the user runtime directory (URD) creation occurs before any slurm script is actually run:

Feb 16 21:31:18 su[2121572]: (to <username>) root on none
Feb 16 21:31:18 systemd[1]: Created slice User Slice of UID <uid>.
Feb 16 21:31:18 systemd[1]: Starting User runtime directory /run/user/<uid>...
Feb 16 21:31:18 systemd[1]: Started User runtime directory /run/user/<uid>.
Feb 16 21:31:18 systemd[1]: Starting User Manager for UID <uid>...
Feb 16 21:31:18 systemd[2121576]: pam_unix(systemd-user:session): session opened for user <username> by (uid=0)
Feb 16 21:31:19 systemd[2121576]: Started Mark boot as successful after the user session has run 2 minutes.
Feb 16 21:31:19 systemd[2121576]: Reached target Paths.
Feb 16 21:31:19 systemd[2121576]: Reached target Timers.
Feb 16 21:31:19 systemd[2121576]: Listening on Sound System.
Feb 16 21:31:19 systemd[2121576]: Listening on Multimedia System.
Feb 16 21:31:19 systemd[2121576]: Starting D-Bus User Message Bus Socket.
Feb 16 21:31:19 systemd[2121576]: Listening on D-Bus User Message Bus Socket.
Feb 16 21:31:19 systemd[2121576]: Reached target Sockets.
Feb 16 21:31:19 systemd[2121576]: Reached target Basic System.
Feb 16 21:31:19 systemd[1]: Started User Manager for UID <uid>.
Feb 16 21:31:19 systemd[2121576]: Starting Sound Service...
Feb 16 21:31:19 systemd[1]: Started Session c23 of user <username>.
Feb 16 21:31:19 su[2121572]: pam_unix(su-l:session): session opened for user <username> by (uid=0)
Feb 16 21:31:19 su[2121572]: pam_unix(su-l:session): session closed for user <username>
Feb 16 21:31:19 systemd[1]: session-c23.scope: Succeeded.
Feb 16 21:31:20 slurmd[7631]: slurmd: Launching batch job 404 for UID <uid>
<... trucncated ...>
Feb 16 21:31:29 systemd[1]: Stopping User Manager for UID <uid>...
Feb 16 21:31:29 systemd[2121576]: Stopping D-Bus User Message Bus...
Feb 16 21:31:29 systemd[2121576]: Stopping Accessibility services bus...
Feb 16 21:31:29 systemd[2121576]: Stopped target Default.
Feb 16 21:31:29 systemd[2121576]: Stopping Sound Service...
Feb 16 21:31:29 systemd[2121576]: Stopped D-Bus User Message Bus.
Feb 16 21:31:29 systemd[2121576]: Stopped Accessibility services bus.
Feb 16 21:31:29 systemd[2121576]: Stopped Sound Service.
Feb 16 21:31:29 systemd[2121576]: Stopped target Basic System.
Feb 16 21:31:29 systemd[2121576]: Stopped target Sockets.
Feb 16 21:31:29 systemd[2121576]: Closed Multimedia System.
Feb 16 21:31:29 systemd[2121576]: Closed D-Bus User Message Bus Socket.
Feb 16 21:31:29 systemd[2121576]: Stopped target Paths.
Feb 16 21:31:29 systemd[2121576]: Stopped target Timers.
Feb 16 21:31:29 systemd[2121576]: Stopped Mark boot as successful after the user session has run 2 minutes.
Feb 16 21:31:29 systemd[2121576]: Closed Sound System.
Feb 16 21:31:29 systemd[2121576]: Reached target Shutdown.
Feb 16 21:31:29 systemd[2121576]: Started Exit the Session.
Feb 16 21:31:29 systemd[2121576]: Reached target Exit the Session.
Feb 16 21:31:29 systemd[1]: user@<uid>.service: Succeeded.
Feb 16 21:31:29 systemd[1]: Stopped User Manager for UID <uid>.
Feb 16 21:31:29 systemd[1]: Stopping User runtime directory /run/user/<uid>...
Feb 16 21:31:29 systemd[1]: run-user-<uid>.mount: Succeeded.
Feb 16 21:31:29 systemd[1]: user-runtime-dir@<uid>.service: Succeeded.
Feb 16 21:31:29 systemd[1]: Stopped User runtime directory /run/user/<uid>.
Feb 16 21:31:29 systemd[1]: Removed slice User Slice of UID <uid>

I’ve spent some time digging into how to modify XDG_RUNTIME_DIR as early as possible, trying pam_env and /etc/systemd/system.conf, all with no luck. I can get verification in the logs that the variable gets updated, both before and after pam_systemd actually creates the URD, but it doesn’t change it, it’s always /run/user/<uid>.

But taking a step back, I don’t think the URD location matters; xfwm4 is having problems because the URD gets deleted shortly after its creation, but deep diving through some man pages indicates that’s to be expected. At job submission, slurmd launches the job under the user’s UID by executing an su, triggering a user login session and resultant URD creation. But the actual job processes run under the slurmd service session, not a user session. So regardless of where the URD is, it’s going to be deleted after 10 seconds (a time limit controlled by UserStopDelaySec in logind.conf)

But that begs the question: how has anyone been able to get this working? The only solution I can see is to somehow keep the URD sticking around, either by setting a long UserStopDelaySec variable or by enabling user lingering (with loginctl enable-linger).

Hi Jeff and team,
If you’re still interested in looking at this, I’d love to get your thoughts. If you’re available, I could set up a call over Microsoft Teams, which would allow me to share a screen and give you more visibility into what I’m seeing.
Thanks again for any assistance!

Yes we can meet, send an invite to johrstrom@osc.edu.

I’d be interested in seeing more journalctl output.

I’m now wondering if it’s perhaps it’s because you’re in a cgroup when you’re in a Slurm job. When you shell in, I doubt you get the same cgroup settings (if any at all).

So I’m wondering if there’s a permission type issue here or similar. I guess I’d also want to look logs in /var/log/audit/ if you have SELinux enabled.

I had forgotten this; that our newest cluster is RHEL/8. We haven’t migrated any of our applications to it yet, and lo and behold - I can replicate the same exact error.

Not a similar error - it seems to be the exact same.

xfwm4: Unknown option --daemon.
Type "xfwm4 --help" for usage.
+ xsetroot -solid '#D3D3D3'
+ xfsettingsd --sm-client-disable
Killing Xvnc process ID 4155812
xfsettingsd: Error spawning command line “dbus-launch --autolaunch=543f64dd20ca40f69108531cad975036 --binary-syntax --close-stderr”: Child process exited with code 1.

(xfsettingsd:4155908): xfsettingsd-ERROR **: 16:14:54.568: Failed to connect to the dbus session bus.
/users/PZS0714/johrstrom/ondemand/data/sys/dashboard/batch_connect/dev/matlab/output/8a921b60-410d-40c1-a2c8-45e8ce0bac42/script.sh: line 25: 4155908 Trace/breakpoint trap   (core dumped) xfsettingsd --sm-client-disable

So, given that I can replicate so easily - it seems that this is a common/shared issue with RHEL/8.

I think I’ve made some progress on this. Try this script.sh.erb. You’ll note that we have to background a few things instead of the entire block. Along with some sleeps that are a bit hacky to be sure, but I’ve found that we need things to boot up properly before the next command is issued.

#!/usr/bin/env bash
unset XDG_RUNTIME_DIR
export XDG_RUNTIME_DIR="/tmp/user/$(id -u)"

# Clean the environment
module purge

# Set working directory to home directory
cd "${HOME}"

# Launch Xfce Window Manager and Panel
#

export SEND_256_COLORS_TO_REMOTE=1
export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
export XDG_CACHE_HOME="$(mktemp -d)"
module restore
set -x
xfwm4 --sm-client-disable &
sleep 5
xsetroot -solid "#D3D3D3"
xfsettingsd --daemon --sm-client-disable
xfce4-panel --sm-client-disable &

sleep 5



#
# Start MATLAB
#

# Load the required environment
module load matlab

# Launch MATLAB
module list
set -x
matlab -desktop

Jeff, thank you very much for your engagement on this!

Those edits didn’t work for me, but I was able to get a fix.

It had to do with the dbus-launch command you first identified on our call, when reviewing the bc_desktop/template/desktops/xfce.sh script. I’m not sure what made me conclude it wasn’t working during the call, but it’s definitely working now.

Here’s my submit.sh.erb (the notable addition being eval $(dbus-launch --sh-syntax):

#!/usr/bin/env bash

# Clean the environment
module purge

# Set working directory to home directory
cd "${HOME}"

# Launch Xfce Window Manager and Panel
#

(
  export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
  export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
  export XDG_CACHE_HOME="$(mktemp -d)"
  module restore
#  set -x
  eval $(dbus-launch --sh-syntax)
  xfwm4 --compositor=off --sm-client-disable
  xsetroot -solid "#D3D3D3"
  xfsettingsd --daemon --sm-client-disable
  xfce4-panel --sm-client-disable
) &

#
# Start MATLAB
#

# Load the required environment
module load matlab

# Launch MATLAB
module list
set -x
matlab -desktop

I’m still a little hazy on the full picture, but the result that the Xfce components now have a dbus session interface running in the slurmd cgroup, rather than trying to use the short-lived connection in the user cgroup (XDG_RUNTIME_DIR). And the window control features work as expected!

1 Like

We recently had to revisit this due to a separate issue, and time had provided a little insight (I think!).
To summarize, for future reference:

To get a working setup on RHEL 8 with Xfce 4.16.1 (from EPEL), two changes are needed from OSC’s baseline script.sh.erb:

  1. A message bus for the various Xfce components has to be manually launched (dbus-launch)
  2. The window manager xfwm4 has to executed in the background; the --daemon option is no longer valid

Here’s the parenthesized block of Xfce commands we’re using with success:

# Launch Xfce Window Manager and Panel
#
(
  export XDG_CONFIG_HOME="<%= session.staged_root.join("config") %>"
  export XDG_DATA_HOME="<%= session.staged_root.join("share") %>"
  export XDG_CACHE_HOME="$(mktemp -d)"
  eval $(dbus-launch --sh-syntax)
  xfwm4 --compositor=off --sm-client-disable &
  xsetroot -solid "#D3D3D3"
  xfsettingsd --daemon --sm-client-disable
  xfce4-panel --sm-client-disable
) &
2 Likes

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.