Running Interactive Desktop card disappears

lcrownover · December 11, 2023, 6:37pm

We’re running OOD 3.0.1 and we have plenty of users who’re using OOD just fine. One user, however, is having an issue where she will submit a new interactive desktop job, the queued card appears, then as soon as the job starts, the card disappears. You can see the job running, but the green Running card is missing. Once the job completes, the card reappears and shows Completed like normal.

I’ve crawled around these forums and some things I found were suggestions about cleaning up .bashrc and .bash_profile, which I’ve done. I’ve also tried deleting her ~/ondemand directory.

What might the next troubleshooting steps look like?

Thank you!

jeff.ohrstrom · December 11, 2023, 6:44pm

Hi and welcome!

Here are the document pages on where to look. Specifically in the log directory of that particular job.

https://osc.github.io/ood-documentation/latest/how-tos/debug/debug-interactive-apps.html

It sounds like the issue is stemming from this (given it’s only affecting 1 single user) but the output.log of the job(s) that fail will give a much better indication of why it’s failing.

lcrownover · December 11, 2023, 7:37pm

Thanks for the info. I’ve compared the affected user’s output.log and my own (which works fine) and attached only the differences at the bottom.

The only real difference I’m seeing is that mine claims to be setting the VNC password and writing the connection.yaml file, but I do see a connection.yml in her output directory as well and it seems to have valid information in it (passwords are set, etc).

broken:

(nm-applet:988950): Gtk-WARNING **: 10:11:56.971: gtk_widget_size_allocate(): attempt to allocate widget with width -1 and height 1

(mate-settings-daemon:988917): dbind-WARNING **: 10:12:00.727: AT-SPI: Error in GetItems, sender=(null), error=Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.


(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.606: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.607: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.607: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.607: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.607: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

(mate-settings-daemon:988917): GLib-GObject-CRITICAL **: 10:25:46.607: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

works:

(mate-settings-daemon:989429): dbind-WARNING **: 10:13:17.071: AT-SPI: Error in GetItems, sender=org.freedesktop.DBus, error=Message recipient disconnected from message bus without replying
Setting VNC password...
Generating connection YAML file...
mate-session[989373]: CRITICAL: gsm_systemd_set_session_idle: assertion 'session_path != NULL' failed

jeff.ohrstrom · December 11, 2023, 10:10pm

Yea it’s hard to say what’s relevant there or not. I’m not able to discern anything really relevant, but again, the ~/.bashrc or similar is where I would look. I’d also check to see if bash is even their SHELL (could be some other rc file you’re looking for).

Typically user specific conda environments can throw this off. That’s what comes to my mind anyhow.

lcrownover · December 11, 2023, 11:49pm

Yeah, unfortunately I’ve already looked in her .bashrc and .bash_profile and they’re just the basic system defaults from RHEL (and she’s using bash, not zsh or another shell).

I wonder what causes the visibility of the card in the interactive sessions view. Like, OOD has to determine what to show, I wonder if I can trace down what’s causing it to not find that job in a list somewhere.

jeff.ohrstrom · December 12, 2023, 1:47pm

It’s the connection.yml + the state of the job. The job states from the scheduler correspond to what we show, with the exception of the starting state.

Starting: no connection.yml but the job is in a Running state
Running: a valid connection.yml and the job is in a Running state

If the job get’s completed on the OOD side then the job is actually completed on the scheduler side. Otherwise we’d sit in starting state for the entire job’s duration waiting for the connection.yml.

You can likely confirm this in sacctmgr (Slurm’s historic information) or similar to see the job only ran for a minute or so.

If it only impacts one user, that’s the only clue we have to go on. What scheduler do you use? Could there be something in ~/.slurm_defaults or similar? If it’s not their shell environment… Then it must be something else specific to that user.

lcrownover · December 28, 2023, 5:34pm

Thanks for the detailed information. Somehow, the user reports the issue is magically fixed, and I didn’t do anything to fix it…

Either way, this info will help me troubleshoot if it comes back, or someone else experiences this issue.

Thanks!

system · June 25, 2024, 5:35pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OOD interactive desktop Xfce4 desktop lock screen-timeout and disable lock screen option under user menu Get Help	24	1625	May 17, 2022
XFCE interactive desktop for Rocky Linux 8 not working but worked just once Get Help question	5	792	January 10, 2024
Interactive-desktop immediately completed Get Help	14	241	September 4, 2024
Interactive desktop: DISPLAY error Get Help	20	7143	May 26, 2022
OOD desktop doesn't work on Rocky 8.7 Get Help	18	567	February 6, 2024

Running Interactive Desktop card disappears

Related topics