Multiple instances issue with interactive apps

jd-daniels · March 29, 2023, 12:20pm

We are working on rebuilding our cluster with new hardware and networking. The idea was to have both clusters running simultaneously and to let users test on the new cluster before we begin moving all the rest of the nodes to it. We also spun up a new OOD latest instance (required for RHEL 9.1). Our old instance is OOD 1.8. Home directories are shared between the two.

What I’m now running into is when I launch an interactive app on one of them, then delete it and log out and go to the other instance and try to launch an interactive app, the app instantly ‘completes’, even though the job is running on the slurm scheduler. This happens in both directions. It seems to resolve when I completely turn off OOD on the side where it was working then try again on the second side.

It seems to me like the only thing that could be causing that is the ~/ondemand directory shared between them. But lsof on either side doesn’t show any open file handles in ondemand. I thought maybe the nginx per-user process might be holding something like a session socket or something, but it doesn’t seem that way.

Any thoughts on this would be greatly appreciated.

Thanks,
John Daniels

jd-daniels · March 29, 2023, 6:18pm

I think this is because of :
/home/<user>/ondemand/data/sys/dashboard/batch_connect/db
being shared between the two instances.

It doesn’t seem like you can set the ondemand directory location for users. Is that the case?

gbyrket · March 29, 2023, 7:12pm

Hi John.

Thanks for your post. I’ve looked through the code and through documentation. I’m unable to find any functionality that allows you to change your home directory.

If anyone else knows differently, please say so.

Thanks,
-gerald

jd-daniels · March 30, 2023, 2:02pm

We think we found a solution. This occurs under the circumstances if the user still has a browser (even if idle or different browser) still open to the other instance of OOD. It seems to be some interaction between the sessions and shared home directory.

Thanks,
John

jeff.ohrstrom · March 30, 2023, 2:06pm

Yes, your intuition is correct. The two instances conflict with each other. Let’s call them clusters A and B. When you schedule a job on A then OnDemand queries B for that job id and doesn’t find it, it assumes it’s complete and marks it as such.

You have a couple options, though none are without some work. I’d suggest the first as it’s the cleanest for your users.

Rename the cluster on B. Let’s call it owens. So you have /etc/ood/clusters.d/owens.yml on both systems. You could rename the one on B to /etc/ood/clusters.d/owens-upgrade.yml. This’ll make OnDemand see them as different schedulers entirely so they’ll stop conflicting with each other. Though this’ll require you update your apps to use both clusters.
You could use different ondemand_portal settings. This defaults to ondemand so it creates ~/ondemand directories. If you set B’s ondemand_portal to say owens-new, then you’d start to create directories under ~/owens-new. Though this may get you into other trouble sharing data. For example the job composer data/jobs/etc won’t be shared which could be unanticipated for your users.
nginx_stage.yml — Open OnDemand 2.0.20 documentation

jd-daniels · March 30, 2023, 5:39pm

@jeff.ohrstrom

Thanks, that’s clears it up! I’ll try out those options and see what makes sense.

-John

system · September 26, 2023, 5:40pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Interactive job disappearing when using two OODs Get Help	10	1257	May 19, 2022
Interactive app sessions "overlap" from distinct slurm controllers Get Help question	6	290	March 21, 2023
Does OOD v1.8 still support our multiple instance setup? Get Help	6	1264	March 14, 2022
Interactive App Sessions Disappear Get Help ondemand2 , question	3	491	May 26, 2022
Multiple Open OnDemand portals, for the same cluster? Get Help	5	382	February 24, 2024

Multiple instances issue with interactive apps

Related topics