Passenger.sock address already in use

I’ve got a user that’s receiving this error. I can’t seem to find anything on google to help solve this. Can anyone please help. I can login to ondemand just fine, but this user continues to receive this error.

OnDemand version: v1.6.25 | Dashboard version: v1.35.3

Something is wrong with their Per User Nginx (PUN). You can clean the PUN by issuing this command.

sudo /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user ch237886

If that doesn’t resolve the issue - issue the command again to ensure the PUN is not running. Use ps and so on to be sure that there’s not nginx running for this user. When you’re sure there’s no processes running for this user - remove that socket file /var/run/ondemand-nginx/ch237886/passenger.sock.

I had already cleaned out their processes with the kill command and removed that socket file before reaching out.

Your command gives this error:

[ch230108@ondemand ~]$ sudo /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user ch237886
invalid option: --user
Run 'nginx_stage --help' to see a full list of available command line options.

I’ve had them try with a different browser to see if it is a cache thing.

I’m getting another user with this same error now. The original user needed access for a class and that class has already started so the effort to get them online was abandoned since they no longer needed access.

Also yesterday and today the server has had 502 errors in the morning. I’ve had to reboot the VM, restarting the service did not help.

Issues like this are really hard to track down.

My best guess, from reading this topic below is it’s somehow related to the SSSD stack and how you authenticate.

Somehow your users are able to start the PUN, but they’re not able to connect to the socket. Are there any system level messages (like audit logs or /var/log/messages) that could indicate an issue? Some messages in that topic seem to indicate it’s as simple as lowercasing your users. There seems some mismatch in REMOTE_USER (however you pull the username from your authentication system) and the actual Linux user.

You mean it’s completely down for everyone?

The server came back up, but the reboot didn’t help this situation. Unfortunately I think this is going to just be a mystery. Our auth is a combination of winbind for passwords and local groups that have to be synced around the cluster. I’d love to be on sssd, but my predecessor chose a different route.

Sorry I don’t have a better answer for you. We’ve seen this error come up occasionally, but have never been able to replicate it.