We have OnDemand version: v1.8.18 installed for our cluster.
For most users, interactive apps run fine without issues. But for some, interactive apps (regardless of the application) start then disappear. The jobs still run on the cluster.
I am wondering if anyone else also have similar experience, and if there are any suggestions on how we can fix it.
Thank you very much for your help.
You can check their
/var/log/ondemand-nginx/<user>/error.log for the commands they issue. I’m wondering if they issued an
squeue (SLURM for get job information) and it failed for some reason. If it does fail or returns something strange, we tend to mark the job
completed thinking the scheduler doesn’t have that job information anymore.
Thank you! I will take a look when I get the permission to do so.
Will get back to you if I see anything strange.