502 proxy errors

Good afternoon. 2 users in a large environment recently started experiencing 502 proxy errors. Is there a specic place I can check for more details regarding what is causing the errors?
I have checked the following so far:
/var/log/httpd
ondemand_exporter_access.log
ssl_error_log
ondemand_exporter_error.log
error_log
access_log

/var/log/ondemand-nginx
I checked the user subdirectories also in this location as well.

The strange thing is that the users mentioned these errors after opening a console session while opening ood. Any advice would be appreciated. I can post logs if that would help. Thanks.

/var/log/ondemand-nginx/$USER/error.log would be the best bet here I think. Then you may want to do a ps check on the host to find their PUNs and see if the processes’ are defunct or what.

Thanks Jeff!. Ok. I found the PUN for one user from Nov 12th. Should I kill the process and have them retry? Thanks so much!

You should likely kill it yes. There’s a crontab entry that should have stopped it - but I guess not? In any case, there’s a command to stop it so that you don’t have to find it’s process id and kill it manually.

Thanks so much. Trying now. I appreciate your help.

“nginx-stage” does not seem to exist on my device. I do have an /etc/ood/config/nginx-stage.yml.
Can I just manually kill the process using the kill command?

*reread your last message. I will kill it manually and have the user retry*

The full command path is /opt/ood/nginx_stage/sbin/nginx_stage

Welp. The user is now getting: “Now the error reads “The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.”"
Thanks Jeff. Is there a process I need to start now?

You’re now getting 500 server error instead of 502. You can look at the same ondemand-nginx logs for the same.

You (as the admin) shouldn’t need to start the PUN. But it is much safer to use nginx_stage to stop the pun and it’s children. If you used kill <PID> to stop the PUN you likely left a child or killed the wrong process.

I was monitoring the error.log when the uer tried and nothing showed up. I kill a root ngingx process for the user as well as an nginx process for the user from the 12th.

Hey Jeff. I was able to get the 2nd user going using the nginx-stage command to kill their defunct job(s). Thanks you. Is there anything I can do for the other user where I killed the processes manually?

1 Like

I tried to reload and restart the user process with the nginx-stage command and got errors since I killed the processes. I just ran the command again without specifying a “signal” and I didn’t get an error. The user is going to test things. Thanks for your help and patience Jeff.

1 Like

Glad to hear it! Just let me know if there are more issues.

1 Like