Persistent Remote Applications Running as Interactive Jobs & Other Related Features

I’m not quite sure if my title gets across what I’m asking for. I talked a little bit about this with some of you at PEARC and I wanted to do a few things here: first, type it out a bit more clearly here, then look at what similar features have been requested, and finally talk about possible implementations and softwares to do it (though I don’t think we are limited to these options). Also brevity is not my strong suit so I apologize for the length of this post.

I would like all interactive applications, started as jobs on the cluster, to have the option to be persistent for the duration of that job. So that means that applications could be started as interactive and the user could join the session at any time. They could also leave the session and have the job continue. I believe this is already how apps like Jupyter notebooks work, where the server is running, the user can connect to it, and if they close their tab they can come back to it as long as the job is still running. One of the use cases here is for things like CFD or FEA applications that have a traditional GUI but may have a long running job that needs to be or is easier to set up in that GUI. Right now if a user tries to do this inside a desktop session and they lose connection or close the tab their job ends.

I went ahead and searched through previous feature requests and questions to see if something like this has been asked for before. A very similar but not quite the same request was made for restartable interactive apps here. I think the user here is trying to do kind of what I am proposing but isn’t requesting that as a feature. Then there are a couple of related threads on using tmux or screen or a similar session system. Threads like this one and this one talk about persistent interactive terminals. They propose a similar system to what I want but for non-graphical applications, also it seems the easy solutions to some these problems are to run a terminal application in the graphical system.

I assume this would be another option along side NoVNC for graphical applications and would most likely work in a similar way. In theory the same ideas of resuming a session could be applied to something like screen or tmux as well and the framework here could be adapted for them if administrators so choose.

Now some of the proposed solutions and options for implementing this. I see two promising possibilities one call xpra (short for X11 Persistent Remote Applications) and the other is x2go. I prefer xpra as I think it will be easier and has a larger feature set. X2go uses the same compression libraries as NX3/nomachine and works more like a traditional remote desktop or VNC software.

Both offer the ability to have sessions persist upon disconnect, have html clients that can display remote systems in the browser, allow for both full desktop or individual applications to be forwarded, are open source projects, and work over ssh. Xpra can also work over plain TCP, websockets, SSL, and a bunch of other protocols and authentication methods.

Overall the features I’m requesting are persistent x11 applications and the ability to forward individual applications as well as full desktops. I’m not sure of all the details and in an ideal world this would be pull request rather than a feature request but alas I don’t have the time or current knowledge to do that. Though in the future I may be able to test some of this out when I get some OOD installs into production.

I run MATLAB GUI in the way you describe but I haven’t seen the behavior of the application stopping while the user is disconnected. Can you expand on the behavior and how to trigger it?

Alex: Thanks for posting. I’m a bit confused and wonder if there is a terminology issue here or configuration situation. In particular, this statement you made is confusing to me: “Right now if a user tries to do this inside a desktop session and they lose connection or close the tab their job ends”.

To try to clarify, let’s use ANSYS as an example. At OSC, I can right now request say a 8 hour ANSYS job. Once it starts, I can connected to the interactive ANSYS Workbench and do whatever I want. Say if after 1 hour I’m done for a while, I can just close that tab or even power off my laptop computer and ANSYS will continue running just fine for another 7 hours. I could even log on 4 hours later via a different laptop and reconnect right back where I left off. The only way to force ANSYS to stop running is if I use the ANSYS menu to quite out of the program, or if I go into the card in the OnDemand interface and select DELETE on the job.

You stated “Overall the features I’m requesting are persistent x11 applications and the ability to forward individual applications as well as full desktops” and it seems to me we already have those features. Can you clarify a bit?