I installed and configured Ondemand on RHEL7 via the “official” puppet module. The deployment also has Keycloak integration for authN.
A few weeks ago a some users started reporting that “from time to time”, when they try to access ood, they get the maintenance page. To solve it they didn’t need to do anything, just wait for a few seconds and refresh the page and the maintenance page goes away.
At that time I was running 1.7.11. This week I upgraded to 1.7.14 but apparently people are still having the same issue.
It seems it happens quiet often (few times per day maybe)
Do you have any idea of why this might be happening? Do you know how I can investigate this issue? What logs should I look at?
Thank you very much,
Users are saying that when this happens they often need to relogin again. Might this issue be related to having non matching values for the session timeout and idle session timeout in Keycloak and Ondemand ? (I had 8h for ondemand and 4h in Keycloak)
I don’t think this has anything to do with session settings. The code in Apache configs that handles the maintenance page is told to not show the maintenance page unless
/etc/ood/maintenance.enable exists. The fact you see it without that file is very strange and not something we’ve been able to reproduce. We also use Keycloak with OnDemand and have maintenance enabled.
For now you may want to just disable the maintenance page logic by setting
use_maintenance: false in
/etc/ood/config/ood_portal.yml and then re-running
thanks a lot for your message and your incredible puppet modules. I am a big fan of your work!
Now for this, I was indeed able to reproduce what my users were saying. These are the steps I followed:
Edit /opt/rh/httpd24/root/etc/httpd/conf.d/auth_openidc.conf and set some ridiculously low values for the OIDC session expiration:
Start an interactive app (in my case a Jupyterlab server) and wait until the session expires on its own. You will start getting some errors in the jupyter notebook. I you open Ondemand again you will get the maintenance page.
I think I found the problem. In
/opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf comment out or delete the line that contains
ErrorDocument 503 /public/maintenance/index.html and see if the issue goes away. I’ve taken one of our dev instances of OnDemand that is connected to Keycloak and made same config changes and have not yet been able to reproduce.
If commenting out
ErrorDocument doesn’t help, I’m curious about your session timeout settings on Keycloak side. The changes we’ve made from defaults are the following realm properties:
"accessTokenLifespan" : 1800,
"ssoSessionIdleTimeout" : 3600,
"ssoSessionMaxLifespan" : 604800,
The above are from kcadm.sh doing something like
kcadm.sh get realms.
Some other non-standard configs we have set for our OIDC instances:
OIDCStripCookies: mod_auth_openidc_session mod_auth_openidc_session_chunks mod_auth_openidc_session_0
I commented out that line and now I am getting the default Apache 503 error page:
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
My realm values are (I think Keycloak’s defaults or very similar to them):
"accessTokenLifespan" : 3600, (1h)
"ssoSessionIdleTimeout" : 14400, (4h)
"ssoSessionMaxLifespan" : 28800, (8h)
So you are still getting a 503 from the maintenance RewriteRule but makes no sense why you’d be getting that. There are by default 3 RewriteCond that are treated as AND conditions and at least 2 of them are going to be true in most cases but the
RewriteCond /etc/ood/maintenance.enable -f is going to be false so the RewriteRule should never get hit. This feels like a bug with mod_auth_openidc, but have very little to prove that.
Can you remove the maintenance logic with
use_maintenance: false in
ood_portal.yml and see if the issue goes away? I am curious if you might hit some other issue that the maintenance rewrite is masking.
Also curious if you have any files with 503 behavior defined, so can check I think with
grep -HnR "503" /opt/rh/httpd24/root/etc/httpd/conf*. If you are using Puppetlabs Apache module then your Apache config is likely very similar to ours and there won’t be any other files with 503 behavior defined.
Hey sorry for the delay. Are you still seeing this issue? Did you ever track down the cause?
thanks a lot for your message.
I think the root case was the expiration of the OIDC session.
My IdP (Keycloak) was configured to issue 4h long tokens but OOD (thought the apache web server config) was configured to use a lower session duration.
To solve it, I replaced he default OIDC session values in the OOD apache config by the same values I had in Keycloak.
After that change, I (well, actually my users) never experienced this issue again.
Thanks a lot for your follow up message,