Interactive jobs missing

rarensu · November 18, 2023, 4:58am

Howdy,
I have a group of users who noticed that sometimes when they try to launch interactive jobs, if the request times out (e.g., flaky long distance connection), that they can end up in a weird state where there is no job card on the interactive jobs page but the job is actually running. This particular group has learned to work around it by manually constructing an appropriate URL to reach their running job.

However, I feel that is really bad because users who don’t know that this is a possible outcome will assume that the job didn’t launch and launch a second job, while the first job continues to burn away their CPU allocation units and take up space on the cluster. Imagine, for example, a machine-learning user’s interactive jupyter notebook requesting multiple GPUs effectively taking an entire node offline. The user later wonders where all their CPU allocation units are going.

If this is not a known issue, I would like to submit a bug report. I assume I would need to provide some additional technical details for reproduction.
Richard Lawrence
Texas A&M

jeff.ohrstrom · November 20, 2023, 3:58pm

Unfortunately I don’t think there’s a good solution for you here. I’ve created this ticket upstream for the same. As a guess, I’m guessing we could have some re-try logic so that if it fails, we can check another 1-2 times to ensure that it’s actually complete.

github.com/OSC/ondemand

losing interactive sessions

opened 03:57PM - 20 Nov 23 UTC

johrstrom

From discourse - https://discourse.openondemand.org/t/interactive-jobs-missing/3…105 It appears that we're able to lose the session. I'm guessing this is because of some inter-connectivity issue with the scheduler. Here's what I believe is happening * job correctly submitted * card created * user connects to the app * while updating the card, we lose connectivity to the scheduler and the system the assumes (erroneously) that the job is complete because the scheduler does not return correctly. * The system (erroneously) removes the card

system · May 18, 2024, 3:59pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Running Interactive Desktop card disappears Get Help	7	775	June 25, 2024
"My Interactive Session" Card displays wrong Nodes/Cores Get Help	2	797	November 30, 2022
Interactive App Sessions Disappear Get Help ondemand2 , question	3	495	May 26, 2022
Losing connectivity to JupyterLab session, losing work Get Help question	4	1067	August 24, 2022
Interactive jobs "disappearing" Get Help	7	613	May 19, 2022

Interactive jobs missing

Related topics