Apps entering bad state

Hi, I’ve seen a couple of discussion topics related to ‘bad state’ but none seem to quite match up with what we’re experiencing.

Recently, we’ve had a couple of users report that their Jupyter sessions have entered a ‘bad state’ upon launching. So far we don’t really know what has caused this. Looking at the output.log for the affected jobs, we don’t see anything out of the ordinary.

One person reported a ‘bad state’ issue today. After checking the output.log and digging into the issue, I checked the queue, and this job started to run eventually. I confirmed with the user that he was able to log in to the session that previously errored out with ‘bad state’ and it was suddenly working about half an hour after it errored out.

I apologize that I don’t have more information to go off of, we haven’t made any changes to OOD recently, and things have worked pretty much flawlessly otherwise. I’d appreciate any thoughts on what to start digging in to.

No issues. Bad states are states from the scheduler. Looking at the source code, there are a couple states that map directly to what you may be describing.

We use %t from Slurm/squeue to determine state just for example.

So, for example if Slurm returns PR for preempted - OOD may show that as yellow and a ‘bad state’. I can see from the source code we map that to what the user would see as suspended. We also map S given from Slurm directly to suspended. Those are just 2 examples of mappings from the scheduler to a suspsended state in OOD.

Furthermore, any code we can’t directly map from squeue (we only map so many, and Slurm invents more over time) we map to undetermined with OOD will definitely show the user as in a bad state.

So that’s essentially the long and short of it. The scheduler is reporting the state to us, and we’re trying to report that back to the user. No OOD log is going to tell you why this may be the case, but certainly scheduler logs may.

Thanks! This is very helpful. We’ll look at the Slurm logs from now on for more info. Much appreciated.