Issues running vmd on ood on cluster

We have a cluster built on RHEL 8 and using slurm for scheduling. We are getting an error with the bc_osd_vmd. We meet the minimum specs for prereqs. The xfce desktop environment, turbovnc, and websockify are yum installed. The virtualgl and vmd programs are loaded as modules. We have not used xfce before.

bc_osc_vmd fails with the following output in the log:

Show Dotfiles Show Owner/Mode
/home/wew/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_vmd/output/8265f977-28b7-4ca0-9c4d-7891438b8fd6/
Setting VNC password…
Starting VNC server…

Desktop ‘TurboVNC: cn4:2 (wew)’ started on display cn4:2

Log file is vnc.log
Successfully started VNC server on cn4:5902…
Script starting…
Starting websocket server…
Starting /bin/xfwm4
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules

  • xfwm4 --compositor=off --sm-client-disable
    WARNING: no ‘numpy’ module, HyBi protocol will be slower
    WebSocket server settings:
    • Listen on :6213
    • Flash security policy server
    • No SSL/TLS support (no cert file)
    • Backgrounding (daemon)
      Scanning VNC log file for user authentications…
      Generating connection YAML file…

(xfwm4:366366): xfwm4-CRITICAL **: 14:25:21.750: Xfconf could not be initialized

(xfwm4:366366): xfwm4-WARNING **: 14:25:21.750: Missing data from default files

  • xsetroot -solid ‘#D3D3D3
  • xfsettingsd --sm-client-disable
  • xfce4-panel --sm-client-disable
    xfsettingsd: Could not connect: No such file or directory.

(xfsettingsd:366409): xfsettingsd-ERROR **: 14:25:21.779: Failed to connect to the dbus session bus.

(xfce4-panel:366410): xfce4-panel-WARNING **: 14:25:21.782: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory
xfce4-panel: There is already a running instance

Currently Loaded Modules:

  1. anaconda3/2020.11 2) virtualgl/2.6.6 3) vmd/1.9.4
  • xfce4-terminal -e ‘vglrun vmd’ -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
    Cleaning up…
    Killing Xvnc process ID 366303

Any idea where we should start looking?

Check for missing packages on the system where xfce is supposed to run. xfce4-terminal, for example is its own package, as is xfwm4, etc. If those packages are installed, make sure the PATH available to the script includes /usr/bin, which is where most, if not all, of the executables are installed.

Also, did you do

yum groupinstall Xfce

If not, you’re likely to be finding and adding packages for quite a while.

Cheers,

Ric

image001.png

image002.png

xfce was group installed. I see xfce4-terminal on the path and /usr/bin is on the path.

I think this is the error message you want to hone in on. I would check system logs like /var/log/messages or journalctl for errors. But also maybe start a job and run through those scripts manually. I mean step to that log directory and you’ll see shell scripts you can run and/or just run the commands directly and try to see what other output you may see interactively. That’d be my suggestion to triage - replicate all these commands interactively and check system logs.

We’re still having no luck getting past the xfconf error. I started from scratch and reinstalled all of the pre-requisites. We do not have experience with using turbovnc or xfce.

I notice that you have an anaconda3 module loaded. Does its bin directory contain any
dbus-* commands? If so, they are over-riding the system versions are likely to cause problems.
Try deleting them (or disabling them) and see it that works.

Yes, definitely! The Anaconda dbus-* is a problem for OnDemand desktop applications.

We’ve had much grief because of that.

So I installed a new anaconda3 module just for vmd and made sure no dbus-* files. I get the following. So still having xfconf issues.

Setting VNC password…
Starting VNC server…

Desktop ‘TurboVNC: cn10:1 (wew)’ started on display cn10:1

Log file is vnc.log
Successfully started VNC server on cn10:5901…
Script starting…
Starting websocket server…
Is dbus on path?
/bin/dbus-launch
Starting /bin/xfwm4
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules

  • xfwm4 --compositor=off --sm-client-disable
    WARNING: no ‘numpy’ module, HyBi protocol will be slower
    WebSocket server settings:
    • Listen on :14593
    • Flash security policy server
    • No SSL/TLS support (no cert file)
    • Backgrounding (daemon)
      Scanning VNC log file for user authentications…
      Generating connection YAML file…

(xfwm4:3816896): xfwm4-CRITICAL **: 15:05:08.250: Xfconf could not be initialized

(xfwm4:3816896): xfwm4-WARNING **: 15:05:08.250: Missing data from default files

  • xsetroot -solid ‘#D3D3D3
  • xfsettingsd --sm-client-disable
  • xfce4-panel --sm-client-disable
    xfsettingsd: Could not connect: No such file or directory.

(xfsettingsd:3816939): xfsettingsd-ERROR **: 15:05:08.284: Failed to connect to the dbus session bus.

(xfce4-panel:3816940): xfce4-panel-WARNING **: 15:05:08.304: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory
xfce4-panel: There is already a running instance

Currently Loaded Modules:

  1. anaconda3/2021.05-vmd 2) virtualgl/2.6.6 3) vmd/1.9.4
  • xfce4-terminal -e ‘vglrun vmd’ -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
    Cleaning up…
    Killing Xvnc process ID 3816833

I am getting the same issue

Loaded modules: VirtualGL, Pango, VMD-1.9.3 - all built via easyBuild.

xfce4-terminal -e ‘vglrun vmd’ -T ‘VMD Terminal’ --disable-server
Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
Cleaning up…

Wondering if anyone has come up with anything to solve this yet.

If I only run interactively via loaded VMD module, I am able to get VMD GUI via X11. However, if I the same VMD command in OOD script then I get X11 composite is enabled warning and program quits. Previously, I had success running MATLAB and ParaView on OOD via xfce4 without vglrun. Full output.log below:

Setting VNC password…
Starting VNC server…

WARNING: compute-3-28.saber:1 is taken because of /tmp/.X1-lock
Remove this file if there is no X server compute-3-28.saber:1

Desktop ‘TurboVNC: compute-3-28.saber:2 (himanshu)’ started on display compute-3-28.saber:2

Log file is vnc.log
Successfully started VNC server on compute-3-28.saber:5902…
Script starting…
Starting websocket server…
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules

  • xfwm4 --compositor=off --daemon --sm-client-disable
    WebSocket server settings:
    • Listen on :28034
    • Flash security policy server
    • No SSL/TLS support (no cert file)
    • Backgrounding (daemon)
      Scanning VNC log file for user authentications…
      Generating connection YAML file…

(xfwm4:54961): GLib-CRITICAL **: 21:11:56.240: g_str_has_prefix: assertion ‘prefix != NULL’ failed

(xfwm4:54961): xfwm4-WARNING **: 21:11:56.275: The property ‘/general/double_click_distance’ of type int is not supported

  • xsetroot -solid ‘#D3D3D3
  • xfsettingsd --sm-client-disable
  • xfce4-panel --sm-client-disable

(xfsettingsd:54978): GLib-CRITICAL **: 21:11:56.738: g_str_has_prefix: assertion ‘prefix != NULL’ failed

(xfsettingsd:54978): GLib-GObject-CRITICAL **: 21:11:56.739: g_value_get_string: assertion ‘G_VALUE_HOLDS_STRING (value)’ failed

(xfsettingsd:54978): GLib-GObject-CRITICAL **: 21:11:56.740: g_value_get_string: assertion ‘G_VALUE_HOLDS_STRING (value)’ failed

Currently Loaded Modules:

  1. libpciaccess/0.14-GCCcore-7.3.0
  2. libunwind/1.2.1-GCCcore-7.3.0
  3. VirtualGL/2.6.1-foss-2018b
  4. binutils/2.28-GCCcore-6.4.0
  5. numactl/2.0.11-GCCcore-6.4.0
  6. hwloc/1.11.7-GCCcore-6.4.0
  7. GCC/6.4.0-2.28
  8. OpenMPI/2.1.1-GCC-6.4.0-2.28
  9. FFTW/3.3.6-gompi-2017b
  10. gompi/2017b
  11. OpenBLAS/0.2.20-GCC-6.4.0-2.28
  12. ScaLAPACK/2.0.2-gompi-2017b-OpenBLAS-0.2.20
  13. expat/2.2.4-GCCcore-6.4.0
  14. bzip2/1.0.6-GCCcore-6.4.0
  15. libpng/1.6.32-GCCcore-6.4.0
  16. freetype/2.8-GCCcore-6.4.0
  17. fontconfig/2.12.4-GCCcore-6.4.0
  18. xorg-macros/1.19.1-GCCcore-6.4.0
  19. libffi/3.2.1-GCCcore-6.4.0
  20. XZ/5.2.3-GCCcore-6.4.0
  21. libxml2/2.9.4-GCCcore-6.4.0
  22. gettext/0.19.8.1-GCCcore-6.4.0
  23. PCRE/8.41-GCCcore-6.4.0
  24. util-linux/2.31-GCCcore-6.4.0
  25. GLib/2.53.5-GCCcore-6.4.0
  26. pixman/0.34.0-GCCcore-6.4.0
  27. cairo/1.14.10-GCCcore-6.4.0
  28. HarfBuzz/1.7.1-foss-2017b
  29. Pango/1.41.0-foss-2017b
  30. Tcl/8.6.7-GCCcore-6.4.0
  31. Tk/8.6.7-foss-2017b
  32. GMP/6.1.2-GCCcore-6.4.0
  33. nettle/3.3-GCCcore-6.4.0
  34. X11/20171023-GCCcore-6.4.0
  35. libdrm/2.4.88-GCCcore-6.4.0
  36. ncurses/6.0-GCCcore-6.4.0
  37. zlib/1.2.11-GCCcore-6.4.0
  38. LLVM/5.0.0-foss-2017b
  39. foss/2017b
  40. Mesa/17.2.5-foss-2017b
  41. libGLU/9.0.0-foss-2017b
  42. GCCcore/6.4.0
  43. NASM/2.13.01-GCCcore-6.4.0
  44. libjpeg-turbo/1.5.2-GCCcore-6.4.0
  45. xprop/1.2.2-GCCcore-6.4.0
  46. FLTK/1.3.4-foss-2017b
  47. libreadline/7.0-GCCcore-6.4.0
  48. SQLite/3.20.1-GCCcore-6.4.0
  49. Python/2.7.14-foss-2017b
  50. Szip/2.1.1-GCCcore-6.4.0
  51. HDF5/1.10.1-foss-2017b
  52. cURL/7.55.1-GCCcore-6.4.0
  53. netCDF/4.5.0-foss-2017b
  54. Yasm/1.3.0-GCCcore-6.4.0
  55. x264/20170721-GCCcore-6.4.0
  56. x265/2.6-GCCcore-6.4.0
  57. LAME/3.100-GCCcore-6.4.0
  58. FFmpeg/3.4-GCCcore-6.4.0
  59. LibTIFF/4.0.9-GCCcore-6.4.0
  60. Ghostscript/9.22-GCCcore-6.4.0
  61. JasPer/2.0.14-GCCcore-6.4.0
  62. LittleCMS/2.8-GCCcore-6.4.0
  63. ImageMagick/7.0.7-15-GCCcore-6.4.0
  64. ACTC/1.1-GCCcore-6.4.0
  65. VMD/1.9.3-foss-2017b-Python-2.7.14
  • export VMDOPTIXDEVICEMASK=0x1
  • VMDOPTIXDEVICEMASK=0x1
  • xfce4-terminal -e ‘vglrun -c proxy vmd’ -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined

(xfce4-terminal:54990): Gtk-WARNING **: GModule (/usr/lib64/gtk-3.0/3.0.0/immodules/im-ibus.so) initialization check failed: GLib version too old (micro mismatch)

(xfce4-terminal:54990): Gtk-WARNING **: Loading IM context type ‘ibus’ failed
Cleaning up…
Killing Xvnc process ID 54916
xfwm4: Fatal IO error 11 (Resource temporarily unavailable) on X server :2.
xfce4-panel: Fatal IO error 11 (Resource temporarily unavailable) on X server :2.
xfsettingsd: Fatal IO error 11 (Resource temporarily unavailable) on X server :2.

removing vglrun and running VMD directly seems to work for me.

xfce4-terminal -e ‘vmd’ -T ‘VMD Terminal’ --disable-server. —> works

I am curious to know if there are any advantages of running VMD over VirtualGL as everyone seem to be running it over vglrun.

When I tried your fix I get:

  • xfce4-terminal -e vmd -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined

I think you missed putting ‘’ around vmd. See my example above.

This is the line from script.sh.erb:
xfce4-terminal -e “vmd” -T “VMD Terminal” --disable-server

This is the full output from the run:
Setting VNC password…
Starting VNC server…

Desktop ‘TurboVNC: cn10:1 (wew)’ started on display cn10:1

Log file is vnc.log
Successfully started VNC server on cn10:5901…
Script starting…
Starting websocket server…
Is dbus on path?
/bin/dbus-launch
Starting /bin/xfwm4
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules

  • xfwm4 --compositor=off --sm-client-disable
    WARNING: no ‘numpy’ module, HyBi protocol will be slower
    WebSocket server settings:
    • Listen on :48035
    • Flash security policy server
    • No SSL/TLS support (no cert file)
    • Backgrounding (daemon)
      Scanning VNC log file for user authentications…
      Generating connection YAML file…

(xfwm4:1194932): xfwm4-CRITICAL **: 10:23:47.364: Xfconf could not be initialized

(xfwm4:1194932): xfwm4-WARNING **: 10:23:47.364: Missing data from default files

  • xsetroot -solid ‘#D3D3D3
    /home/wew/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_vmd/output/b40e6220-5194-40d7-8805-f8f3ced64fc0/script.sh: line 27: xsetroot: command not found
  • xfsettingsd --sm-client-disable
    /home/wew/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_vmd/output/b40e6220-5194-40d7-8805-f8f3ced64fc0/script.sh: line 28: xfsettingsd: command not found
  • xfce4-panel --sm-client-disable
    /home/wew/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_vmd/output/b40e6220-5194-40d7-8805-f8f3ced64fc0/script.sh: line 29: xfce4-panel: command not found

Currently Loaded Modules:

  1. anaconda3/2021.05-vmd 2) virtualgl/2.6.6 3) vmd/1.9.4
  • xfce4-terminal -e vmd -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
    Cleaning up…
    Killing Xvnc process ID 1194901

Are you able to run any software using xfce and turbovnc? Or this issue is exclusive to VMD?

Plus, can you confirm you have installed X Window on the compute node that’s running this code?

sudo yum groupinstall "X Window System"

We don’t use Xfce or TurboVNC for any other applications so this is all new for us. I am now able to get something to come up but the windows overlap and cannot be moved. Am attaching a screen shot. We’ve done a group install of Xfce. We are on Red Hat 8 and there is no X Window System.

For the run the following is the output before I quit out

Setting VNC password…
Starting VNC server…

Desktop ‘TurboVNC: cn10:1 (wew)’ started on display cn10:1

Log file is vnc.log
Successfully started VNC server on cn10:5901…
Script starting…
Starting websocket server…
Is dbus on path?
/bin/dbus-launch
Starting /bin/xfwm4
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules

  • xfwm4 --compositor=off --sm-client-disable
    WARNING: no ‘numpy’ module, HyBi protocol will be slower
    WebSocket server settings:
    • Listen on :46124
    • Flash security policy server
    • No SSL/TLS support (no cert file)
    • Backgrounding (daemon)
      Scanning VNC log file for user authentications…
      Generating connection YAML file…

(xfwm4:1632921): xfwm4-CRITICAL **: 12:41:53.449: Xfconf could not be initialized

(xfwm4:1632921): xfwm4-WARNING **: 12:41:53.450: Missing data from default files

  • xsetroot -solid ‘#D3D3D3
  • xfsettingsd --sm-client-disable
    xfsettingsd: Could not connect: No such file or directory.

(xfsettingsd:1632966): xfsettingsd-ERROR **: 12:41:53.474: Failed to connect to the dbus session bus.

Currently Loaded Modules:

  1. anaconda3/2021.05-vmd 2) virtualgl/2.6.6 3) vmd/1.9.4
  • xfce4-terminal -e vmd -T ‘VMD Terminal’ --disable-server
    Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
    /home/wew/ondemand/data/sys/dashboard/batch_connect/dev/bc_osc_vmd/output/58252e11-83b9-429c-94fb-c9961e74abd0/script.sh: line 30: 1632966 Trace/breakpoint trap (core dumped) xfsettingsd --sm-client-disable
  • xfce4-panel --sm-client-disable

(xfce4-panel:1632988): xfce4-panel-WARNING **: 12:41:53.766: Failed to connect to the D-BUS session bus: Could not connect: No such file or directory

(xfce4-panel:1632988): xfce4-panel-CRITICAL **: 12:41:53.767: Name org.xfce.Panel lost on the message dbus, exiting.
xfce4-panel: There is already a running instance

Setting VNC password…
Generating connection YAML file…

We are getting the following error which I surmise must be related to the windows for vmd not quite working correctly.

Detected X11 ‘Composite’ extension: if incorrect display occurs try disabling this X server option.

Where is the composite setting in the osc_vmd environment? The windows that are launched with VMD cannot be moved and the display window is partially outside the turbovnc window as shown in the image above.

Just to document, we had a discussion about this as part of the SC’21 OOD User Group BoF. We think the issue is with the xfce config and not VMD specific based upon comparing screenshots of what we see at OSC with our VMD install and the screenshot already attached.

Path forward is to try to first get just a basic remote desktop working by following these instructions: Enable Interactive Desktop — Open OnDemand 2.0.13 documentation

I have the desktop working. I am seeing one issue with power manager. Getting the following. Any way to suppress it?

With the desktop working, and comparing the yml files, I am still getting the issue where I cannot move or resize windows for vmd like in the image up above.