"Extend" button for interactive sessions

This feature request is specifically for interactive sessions rather than batch jobs.

Motivation

I can imagine many scenarios in which a user may want to extend the time limit of an interactive session - e.g. a long-running analysis job is taking longer than they expected, or perhaps they did not think at all about adjusting the time limit when they started the session.

In our clusters, interactive sessions (e.g. for Jupyter Notebooks) often run on “cheap” queues, where there isn’t really much competition for resources, and we could easily afford granting users an extension of a job without upsetting someone else.

Furthermore, in contrast to batch jobs, interactive jobs involve user input and it can be significantly harder / more time-consuming for users to recreate the state of the job after the previous one is killed due to a wall-time constraint.

In a cloud context in particular, interactive sessions are a resource- and cost-effective alternative to “always-on” workstations, and having an “Extend” button could help reduce anxiety of users in switching to such a model.

Feature request

For select queues, add an “Extend” button to the interactive sessions list

Thoughts on implementation

I am aware that extending job durations on HPC schedulers typically requires admin privileges, but perhaps one could build a solution via /etc/sudoers that allows users to run a restricted set of commands as administrator?

E.g. for the case of slurm, a locked-down scontrol update on a hardcoded queue that allows them to add time to an existing job their user owns.

This may not be admissible in every security context, but could work in some. And perhaps there are better solutions.

We would like this too.

Another approach, avoiding sudo for a control, would be to submit the job with a long runtime but schedule a delete job task that the user can cancel.

Hi thanks for sharing this here.

However, I would ask that you open a github ticket on GitHub - OSC/ondemand: Supercomputing. Seamlessly. Open, Interactive HPC Via the Web as that’s really where we look for and track feature requests. I appreciate you opening this here with so much detail - it’s just likely to get forgotten if it’s only here.

1 Like

I’m not a fan of this method as at least for our cluster automatically assigning long run times would cause scheduling issues. Perhaps you could have the user select a minimum time and a maximum time. Is there a way to add a popup over their connection that would ask them to confirm if they were still using the session once the minimum time expired?

1 Like

Another approach is an Extend button that generates an email to the ticketing system to open a request for time extension. We’re usually able to get to it within 24 hours, though in cluster training we stress the importance of allowing us at least 48 hours.

Fancy sudoers manipulation for access to admin scontrol (slurm) sounds like bad form in terms of security. I agree with @aneil2 , long run times would mess with scheduling especially for those in wait queues for small “unsafe” (can be preempted by priority jobs). We usually have a few hundred of those waiting for resources.

Thanks, Kenny

1 Like