I’m running openondemand against a reasonably large cluster (3000 nodes) and every now and then, the PBS scheduler gets really busy and takes up to a minute or two to respond. When this happens, the “My Interactive Sessions” page will time out and fail to load, which understandably upsets our users.
To (possibly) complicate matters, we are using the cluster head nodes as a submit_host so there’s ssh in the middle as well.
Is there any way of decoupling the sessions page from the performance of the scheduler, so that it can at least load, even if it doesn’t show up-to-date information?
(Worst case I’m thinking it might be possible to write a wrapper around the PBS commands that just aborts if the command takes more than a few seconds to run, but I’d like to know if there’s something better I could do).