Greetings – When monitoring the error.log for my account, I am seeing the following repeated in the status:
ERROR “Session specifies nonexistent ‘’ cluster id.”
This began after some network switch reboots about an hour ago.
The shell app launches appropriately. Interactive app test (Jupyter) engages fine with slurm, and receives resources. Attempting to open a kernel succeeds.
Should the ondemand service be restarted? or is there a command option for nginx_stage that would satisfy this failure to recognize (at least in a limited context) what cluster is associated to this ondemand instance?
I’d say try the restart if things were fine before the error message started. Beyond that, it sounds like it is still running fine so maybe this will reload Apache and NGINX and clear the error by letting go of any bad connections that the network blip might have caused.
Hi, Travis – Well, the restart was pretty disruptive. Had to relaunch all my apps – hopefully not too many researchers were affected.
The apache restart also did not resolve the “nonexistent cluster” messages in the error.log
In the future you should not restart a whole production service like that during work hours, I didn’t realize you would do that so I apologize for not clarifying.
Can you say more about the network error that happened? These errors only started after that network blip, you’re sure nothing before that? It is odd that the cluster config would now suddenly not work and the error would be there when nothing has changed.
Some networking equipment, including switches, were rebooted.
However, I have found that in fact the error predates the network switch reboots.
So I have further work to do here.
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.