Proxy Error: The proxy server received an invalid response from an upstream server

rgas20 · May 22, 2023, 4:17pm

Hello,

Last week we encountered an issue with our OnDemand server where each user who logged in would get the following error:

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request

Reason: Error reading from remote server

We aren’t sure exactly what may have triggered it. This is what we see in the logs:

[Mon May 22 11:07:42.938803 2023] [proxy:error] [pid 3295] [client IP:52466] AH00898: Error reading from remote server returned by /pun/sys/dashboard, referer: https://adfs.edu/
[Mon May 22 11:07:42.938716 2023] [proxy_http:error] [pid 3295] (70007)The timeout specified has expired: [client IP] AH01102: error reading status line from remote server httpd-UDS:0, referer: https://adfs.edu/

This is what we have in ood_portal.yml other than the servername and ssl:

host_regex: '(node|big-mem|gpu)\d+'
node_uri: '/node'
rnode_uri: '/rnode'

auth:
  - 'AuthType Mellon'
  - 'Require valid-user'

user_map_cmd: '/etc/ood/scripts/lowercase_username'

Is there a way we can extend this timeout? What might be the cause of this? We don’t see any other issues in the logs. In an effort to get back up and running we restored from a backup, but are having some issues with that restored instance as well, so resolving this may be the most ideal. Thank you!

jeff.ohrstrom · May 22, 2023, 4:59pm

I’d check in /var/log/ondemand-nginx/$USER/error.log - it appears that you’ve authenticated and we’re trying to start the PUN and can’t communicate with it. It’s a 60 second timeout, so it should be enough.

You can set the timeout through this apache configuration. Just drop a timeout.conf in your conf.d directory and it’ll be a global setting.

https://httpd.apache.org/docs/2.4/mod/core.html#timeout

rgas20 · May 22, 2023, 5:39pm

Jeff,

Thanks, this is what we see in the logs:

in 'bool Passenger::SpawningKit::HandshakePerform::checkCurrentState()' (Perform.h:257)
     in 'void Passenger::SpawningKit::HandshakePerform::waitUntilSpawningFinished(boost::unique_lock<boost::mutex>&)' (Perform.h:213)
     in 'Passenger::SpawningKit::Result Passenger::SpawningKit::HandshakePerform::execute()' (Perform.h:1752)
     in 'Passenger::SpawningKit::Result Passenger::SpawningKit::DirectSpawner::internalSpawn(const AppPoolOptions&, Passenger::SpawningKit::Config&, Passenger::SpawningKit::HandshakeSession&, const Passenger::Json::Value&, Passenger::SpawningKit::JourneyStep&)' (DirectSpawner.h:211)
     in 'virtual Passenger::SpawningKit::Result Passenger::SpawningKit::DirectSpawner::spawn(const AppPoolOptions&)' (DirectSpawner.h:261)
     in 'void Passenger::ApplicationPool2::Group::spawnThreadRealMain(const SpawnerPtr&, const Passenger::ApplicationPool2::Options&, unsigned int)' (SpawningAndRestarting.cpp:95)

It otherwise just fails on “Handshake with subprocess - 90.0s” in the detailed diagnostics.

jeff.ohrstrom · May 22, 2023, 5:42pm

hmmmmmm ok… Not much to see there. Anything interesting before or after that message?

Anything in /var/log/messages or journalctl? How about the top level /var/log/ondemand-nginx/error.log?

rgas20 · May 22, 2023, 6:57pm

No other messages in /var/log/messages or journalctl unfortunately nor in the top-level error.log. We took a look again at the Apache logs and nothing else stands out except that original timeout error. We may see if we can try to get this up and running again with the restored version of the VM.

rgas20 · May 22, 2023, 7:31pm

Strangely, this seems to have something to do with OnDemand 3.0, though we aren’t sure what. If we take our restored VM running OOD 2 we are able to log in without trouble, but if upgrade that same VM to 3.0 the exact same issue appears. We’re continuing to look into it.

rgas20 · May 23, 2023, 2:46pm

Looks like we get this same exact issue with OOD 3.0 in a fresh VM set up from scratch. We’re going back to 2.0 for the time being.

jeff.ohrstrom · May 24, 2023, 12:51pm

That’s so strange and indeed unfortunate. I’m sorry I don’t have a better answer for you. I’m not really sure where else we could check.

buzh · May 26, 2023, 9:13am

Are you using selinux? If so, try running semodule -DB to disable any “dontaudit” rules in the standard policy, trigger the fault and check audit.log for any fresh denied messages.

ndusek · June 2, 2023, 2:17pm

We are seeing a similar issue running 3.0.1 on RHEL 8.5. So far I’ve only heard reports of this happening to one user. All I see in their /var/log/ondemand-nginx/$USER/error.log is the following:

[ E 2023-06-02 08:59:32.8974 882831/T2d age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/ood/apps/sys/dashboard: A timeout occurred while spawning an application process.
  Error ID: 4061c76a
  Error details saved to: /tmp/passenger-error-vuoUnL.html

Per one of the earlier suggestions, we bumped the timeout in /etc/httpd/conf.d/timeout.conf to 120 seconds and it’s still timing out.

The only other thing I thought of was that this user had development enabled, and sometimes I’ve seen if a person has a “bad app” in dev, it can cause issues. But I had him move his dev apps out of his home directory and that didn’t fix it, so I don’t think it’s anything he’s done.

Other than that, I can say we are using CAS authentication. I’m working with our enterprise folks to make sure there are no errors on the CAS side, but I am able to successfully auth with my account, so my hunch is CAS is not the issue.

Happy to provide details from other log files if requested.

ndusek · June 2, 2023, 2:33pm

I guess our issue did have to do with having development enabled. The issue was isolated just to this one user, and when I removed his /var/www/ood/apps/dev/$USER/gateway link, he was able to get into the dashboard again. Then we added that link back in and everything is still working.

Not sure what to make of that, but hopefully it’s an additional data point at least.

rgas20 · June 2, 2023, 2:39pm

We never did get more information on this issue but just wanted to add, I am happy to report that we started over and our fresh install of version 2.X is working fine. We are using ADFS authentication and the enterprise folks did not see anything out of the ordinary either. To add to the recent replies, we were not using selinux. On our side, we did have dev enabled but just for one user, but our issue was that suddenly all users reported getting the proxy error one afternoon without any changes to the server. We especially think it’s strange that we had this issue with OOD 3.0 on a fresh install, so it must be something on our end, our collective IT team just has no idea what. To be honest, we haven’t really looked into it more as it’s been working and other projects have taken bigger priority now. We may experiment with another 3.0 instance in the future.

system · November 29, 2023, 2:40pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Proxy Error - The proxy server received an invalid response from an upstream server Get Help	5	207	March 12, 2025
Proxy Error After SSL Certificate Renewal Get Help	2	317	October 22, 2023
Timeouts in reverse proxy? Get Help	3	1219	May 26, 2022
Long load time on log in and restart of web server Get Help ondemand2 , question	16	987	May 17, 2022
Ondemand Web interface times out Get Help	3	1240	May 26, 2022

Proxy Error: The proxy server received an invalid response from an upstream server

Proxy Error

Related topics