Maintenance page without maintenance enabled for one user

Hello Everyone,

I see that one user is unable to use Open Ondemand, and always get the maintenance page.

I do not have maintenance enabled.
I do not see any other user complain about the same.
I have also seen another topic about the same in the past, but this does not get resolved after refresh. It has been the same for that user, and never worked after it started showing this error yesterday.

I saw this error in the error log for that user.

[ N 2023-02-14 14:30:11.2231 15113/T1 age/Cor/CoreMain.cpp:1340 ]: Starting Passenger core…
[ N 2023-02-14 14:30:11.2232 15113/T1 age/Cor/CoreMain.cpp:256 ]: Passenger core running in multi-application mode.
[ N 2023-02-14 14:30:11.2699 15113/T1 age/Cor/CoreMain.cpp:1015 ]: Passenger core online, PID 15113
[ C 2023-02-14 14:30:11.2701 15110/T1 age/Wat/WatchdogMain.cpp:1454 ]: ERROR: write() failed: Broken pipe (errno=32)
in ‘void reportStartupResult(const WorkingObjectsPtr&, const std::vector<boost::shared_ptr >&)’ (WatchdogMain.cpp:1318)
in ‘int watchdogMain(int, char**)’ (WatchdogMain.cpp:1414)

[ W 2023-02-14 14:30:11.2814 15113/T1 age/Cor/CoreMain.cpp:1236 ]: Watchdog seems to be killed; forcing shutdown of all subprocesses

[ N 2023-02-14 14:30:11.2815 15113/T8 age/Cor/CoreMain.cpp:671 ]: Signal received. Gracefully shutting down… (send signal 2 more time(s) to force shutdown)

May i know if the above is normal?
I do not see any current passenger procs of that user, and i do not see any new logs created when that user tries to login.
I do see that authentication returns successful, but it is only after the authentication page that i see the maintenance page.
I am using PAM module for authentiation.

May i please know how to rectify this issue?

Here are the OOD versions:
ondemand-2.0.20-1.el7.x86_64
ondemand-nodejs-2.0-1.el7.x86_64
ondemand-python-2.0-1.el7.x86_64
ondemand-ruby-2.0-1.el7.x86_64
ondemand-sqlite-libs-3.26.0-4.el7.x86_64
ondemand-passenger-6.0.7-1.el7.x86_64
ondemand-gems-2.0.20-2.0.20-1.el7.x86_64
ondemand-sqlite-3.26.0-4.el7.x86_64
ondemand-nginx-1.18.0-2.p6.0.7.el7.x86_64
ondemand-apache-2.0-1.el7.x86_64
ondemand-sqlite-devel-3.26.0-4.el7.x86_64
ondemand-runtime-2.0-1.el7.x86_64

Regards,
Lohit

Hi and sorry for the issue.

The first thing I’d point out is it probably is not a good idea to use PAM auth since it is not secure and it would be a better practice to use an Identity Provider.

To your issue, looking at the logs posted I can’t see a connection to OOD there, and those logs may not be what we need given only one user is having this issue.

I’d be more curious about the NGINX logs for that particular user located at /var/log/ondemand-nginx/<user>. Looking at that log for the user, do you see any ERROR or WARN messages?

Hello Travis,

Thank you. Yes, we are working on using a better identity provider for Ondemand, in future.
Regarding the logs. That log output that i pasted is from the respective path:

[ C 2023-02-14 14:30:11.2701 15110/T1 age/Wat/WatchdogMain.cpp:1454 ]: ERROR: write() failed: Broken pipe (errno=32)
in ‘void reportStartupResult(const WorkingObjectsPtr&, const std::vector<boost::shared_ptr >&)’ (WatchdogMain.cpp:1318)
in ‘int watchdogMain(int, char**)’ (WatchdogMain.cpp:1414)

[ W 2023-02-14 14:30:11.2814 15113/T1 age/Cor/CoreMain.cpp:1236 ]: Watchdog seems to be killed; forcing shutdown of all subprocesses

[ N 2023-02-14 14:30:11.2815 15113/T8 age/Cor/CoreMain.cpp:671 ]: Signal received. Gracefully shutting down… (send signal 2 more time(s) to force shutdown)

It is from /var/log/ondemand-nginx//error.log

I do not see any other WARN messages and that is the only ERROR message that i saw.
Also - i do not see any more logs created/updated under this directory after Tuesday.
Even when that that user tried to access multiple times - i do not see the respective access log updated.

It seems to be that nginx is unable to start another session, because something has been broken before?

Regards,
Lohit

:man_facepalming: sorry about that, didn’t notice the log name.

Well that is interesting, it sounds like they are not getting authenticated in as that won’t up in the log as a failed login, and so nothing would show up.

I’m wondering if there is something off here with the authentication step for that user, since this works for all others.

How is the auth configured for users? Maybe something is not correct with the user mapping is my initial thought. I wonder if the user_map_cmd is not running a correct regex for the user somehow? Sorry, I’m just a bit unsure as to why only this user would have this.

Hey Travis,

Thank you for helping me debug this.
I do not think it has to anything with authentication, since i can see that user has successfully logged in, in the syslogs.
Also i do not see that the user_map_cmd is breaking.
Another thing to note is that, this user had successfully logged in and was able to use ondemand for quite a while.
It suddenly broke one day, and she was never able to use it since then.
I will keep looking, but any other insights would be greatly helpful.

Regards,
Lohit

Ok, thanks for the info, and sorry for all the questions. I want to make sure I have the right sequence of events. The user previously could log in with no issues, now the login, but after successful authentication, they are shown a maintenance page? Is that correct? I want to be sure that the authentication can be ruled out entirely, and no nginx-logs has me wondering if they’ve even authenticated passed apache at that point.

I know you said in the syslog you could see that the user has logged int, but my main question here is are they seeing that maintenance page before a login screen or after they’ve submitted the login?

Again, sorry for needing so much clarification, I just really need to understand the sequence here.

Not a problem Travis.
That user is seeing the maintenance page after submitting the login.
They do not see the home page.

However, if the login is cached - then they dont have to explicity login again and it will directly show the maintenance page.
We have tried different browsers and incognito windows where it does force them to authenticate, but the same thing happens, where the maintenance page shows up after authentication/log in.

A quick google search of this error code 32 says it’s due to resource exhaustion. I would guess that this user is just unlucky in that they’re the last one to login and the server has no more to give.

I would check memory first, then maybe open files. Again, I don’t think it’s unique to that user, it’s more of a timing issue. Let’s say it is a memory constraint, it’ll impact whoever’s last to login, or at least it’ll impact whoever cannot allocate the memory they require.

I would check journalctl for more failures. Systemd and/or the kernel should be telling you something about killing a process because it can’t allocate something (more files or it’s out of memory).

Thank you Jeff.
I will see if i can find the cause for this, the next time it happens for any other user.
For now, i could fix this issue by removing the orphaned passenger.sock file, in /var/run/ondemand-nginx//

Regards,
Lohit

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.