First, the deployment I am working with is for a more robust test based on some things I did with the HPC Toolkit. We want to test setting up an OOD deployment using Kubernetes/K8s as the main cluster this sits on top of. We’re having some trouble configuring authentication though. I was wondering if I could share my “ood_portal.yaml” and my dex “config.yaml” files and get some feedback on what I am configuring wrong. I have been testing this by a combination of restarting apache and dex with a locally exposed web connection. I need to be able to log in to OOD before I can actually setup the K8s cluster connection. I’ll go ahead and link the files here. I’m a bit out of my element in understanding how the authentication pieces fit together, but I believe I have all the necessary pieces to make it happen.
What’s the behaviour of the application? I mean what’s the error messages you encounter? You should be able to find dex logs in journalctl and apache logs in /var/log/httpd.
I don’t know what the issue is beyond there being one. Can you share some specifics on what’s failing?
The results from the apache log: ==> /var/log/apache2/rc-156-208.rci.uits.iu.edu_error.log <== [Wed Mar 05 17:51:47.379687 2025] [proxy:error] [pid 1074473:tid 136007437506240] (111)Connection refused: AH00957: http: attempt to connect to 127.0.0.1:5556 (localhost:5556) failed [Wed Mar 05 17:51:47.379728 2025] [proxy_http:error] [pid 1074473:tid 136007437506240] [client 149.165.156.208:56936] AH01114: HTTP: failed to make connection to backend: localhost [Wed Mar 05 17:51:47.380359 2025] [auth_openidc:error] [pid 1074472:tid 136007784404672] [client 129.79.197.143:56505] oidc_util_decode_json_object: JSON parsing returned an error: '[' or '{' expected near '<' (<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n<hr>\n<address>Apache/2.4.58 (Ubuntu) Server at rc-156-208.rci.uits.iu.edu Port 443</address>\n</body></html>\n) [Wed Mar 05 17:51:47.380389 2025] [auth_openidc:error] [pid 1074472:tid 136007784404672] [client 129.79.197.143:56505] oidc_metadata_provider_retrieve: JSON parsing of retrieved Discovery document failed [Wed Mar 05 17:51:47.380394 2025] [auth_openidc:error] [pid 1074472:tid 136007784404672] [client 129.79.197.143:56505] oidc_provider_static_config: could not retrieve metadata from url: http://rc-156-208.rci.uits.iu.edu/dex/.well-known/openid-configuration
The ondemand-dex listing in journalctl don’t seem to be showing much beyond my try-restarts and systemctl status commands. Right now, the current problem is a 500 Internal server error though it’s been different. It seems like I’ve walked back any success I’ve had to the furthest point. Restarting apache2 also appears to wipe the /etc/ood/dex/config.yaml file, which without knowing until now, has likely caused a great deal of complications for not knowing what was and was not working.
Okay, we have a host set up at https:// rc-156-208.rci.uits.iu.edu/, which we can connect to and have an authentication page. The test credentials properly take us to the “Grant Access” page. However, when we click the “Grant Access” button, we are not redirected properly. A page displaying the following text appears at the url: https:// rc-156-208.rci.uits.iu.edu/oidc?code=(etc)
My question is what have I missed to make this redirect properly work? Where should it be sending/what should I expect? Based on the apache logs, it seems that the client is improperly specified, but I’m not sure which portion.
==> /var/log/apache2/rc-156-208.rci.uits.iu.edu_error.log <==
[Wed Mar 05 18:30:33.532053 2025] [auth_openidc:error] [pid 1074473:tid 136007445898944] [client 129.79.197.143:56723] oidc_util_json_string_print: oidc_util_check_json_error: response contained an "error" entry with value: ""invalid_client"", referer: https://rc-156-208.rci.uits.iu.edu/dex/approval?req=secjqbffcrlqvnqbfbc6kvmtx&hmac=6G2cQMhi3FxTF-AyqUJfxbB8uOkHCBHHDFdiAbwoV8I
[Wed Mar 05 18:30:33.532075 2025] [auth_openidc:error] [pid 1074473:tid 136007445898944] [client 129.79.197.143:56723] oidc_util_json_string_print: oidc_util_check_json_error: response contained an "error_description" entry with value: ""Invalid client credentials."", referer: https://rc-156-208.rci.uits.iu.edu/dex/approval?req=secjqbffcrlqvnqbfbc6kvmtx&hmac=6G2cQMhi3FxTF-AyqUJfxbB8uOkHCBHHDFdiAbwoV8I
[Wed Mar 05 18:30:33.532082 2025] [auth_openidc:error] [pid 1074473:tid 136007445898944] [client 129.79.197.143:56723] oidc_proto_resolve_code_and_validate_response: failed to resolve the code, referer: https://rc-156-208.rci.uits.iu.edu/dex/approval?req=secjqbffcrlqvnqbfbc6kvmtx&hmac=6G2cQMhi3FxTF-AyqUJfxbB8uOkHCBHHDFdiAbwoV8I
OK - so first off, you shouldn’t be editing /etc/ood/dex/config.yaml. Supply the configurations to the ood_portal.yml and let it do it’s thing to set it all up easily.
You seem to be following the same strategy the demo container uses, so I’ll link those files
Here’s it’s ood_portal.yml
But note that you also need this modification to let you do this at all (it’s insecure, that’s why you need this special flag here).
I see, so most/all of the values I modified in the dex config.yaml can and should go in the ood_portal.yml file. Is there an easy way to ‘reset’ the config.yaml then? I’d like to take it back to square one.
Done. Just to be sure, when I want to test any changes I’ve made to ood_portal.yml, should I be restarting httpd (apache in my case), ondemand-dex, or should I just need to refresh?
Okay, when trying to restart apache2 when the servername is rc-156-208.rci.uits.iu.edu, https://rc-156-208.rci.uits.iu.edu, or commented out, I get an error AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using <ip>. Set the 'ServerName' directive globally to suppress this message. Is ondemand automatically trying to fill in a servername that I can see somewhere? Maybe I misunderstand what it is asking for there.
I tried removing the http(s) portion. That did not change anything. Here’s some output from systemctl
Mar 06 17:02:52 ood update_ood_portal[1155776]: /opt/ood/ood-portal-generator/lib/ood_portal_generator/application.rb:201:in `update_ood_portal'
Mar 06 17:02:52 ood update_ood_portal[1155776]: /opt/ood/ood-portal-generator/lib/ood_portal_generator/application.rb:305:in `start'
Mar 06 17:02:52 ood update_ood_portal[1155776]: -e:1:in `<main>'
Mar 06 17:02:52 ood update_ood_portal[1155776]: Run 'update_ood_portal --help' to see a full list of available options.
Mar 06 17:02:52 ood apachectl[1155784]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 192.168.7.158. Set the 'ServerName' directive globally to suppress this message
Mar 06 17:02:52 ood apachectl[1155784]: (2)No such file or directory: AH02291: Cannot access directory '/etc/apache2/logs/' for error log of vhost defined at /etc/apache2/sites-enabled/ood-portal.conf:45
Mar 06 17:02:52 ood apachectl[1155784]: AH00014: Configuration check failed
Mar 06 17:02:53 ood systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Mar 06 17:02:53 ood systemd[1]: apache2.service: Failed with result 'exit-code'.
Mar 06 17:02:53 ood systemd[1]: Failed to start apache2.service - The Apache HTTP Server.
I’m not sure where to get that stack trace from exactly, but here is the results of journalctl | grep ood
Mar 06 17:57:23 ood systemd[1]: Starting apache2.service - The Apache HTTP Server...
Mar 06 17:57:23 ood update_ood_portal[1158732]: Backing up previous Apache config to: '/etc/apache2/sites-available/ood-portal.conf.20250306T175723'
Mar 06 17:57:23 ood update_ood_portal[1158732]: Generating new Apache config at: '/etc/apache2/sites-available/ood-portal.conf'
Mar 06 17:57:23 ood update_ood_portal[1158732]: chown root:www-data /etc/apache2/sites-available/ood-portal.conf
Mar 06 17:57:23 ood update_ood_portal[1158732]: chmod 640 /etc/apache2/sites-available/ood-portal.conf
Mar 06 17:57:23 ood update_ood_portal[1158732]: Generating Apache config checksum file: '/etc/ood/config/ood_portal.sha256sum'
Mar 06 17:57:23 ood update_ood_portal[1158732]: Backing up previous Dex config to: '/etc/ood/dex/config.yaml.20250306T175723'
Mar 06 17:57:23 ood update_ood_portal[1158732]: mv /etc/ood/dex/config.yaml /etc/ood/dex/config.yaml.20250306T175723
Mar 06 17:57:23 ood update_ood_portal[1158732]: Generating new Dex config at: /etc/ood/dex/config.yaml
Mar 06 17:57:23 ood update_ood_portal[1158732]: mv /tmp/dex_config20250306-1158732-c7mw98 /etc/ood/dex/config.yaml
Mar 06 17:57:23 ood update_ood_portal[1158732]: chown ondemand-dex:ondemand-dex /etc/ood/dex/config.yaml
Mar 06 17:57:23 ood update_ood_portal[1158732]: chmod 600 /etc/ood/dex/config.yaml
Mar 06 17:57:23 ood apachectl[1158746]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 192.168.7.158. Set the 'ServerName' directive globally to suppress this message
Mar 06 17:57:23 ood apachectl[1158746]: (2)No such file or directory: AH02291: Cannot access directory '/etc/apache2/logs/' for error log of vhost defined at /etc/apache2/sites-enabled/ood-portal.conf:45
Mar 06 17:57:23 ood apachectl[1158746]: AH00014: Configuration check failed
Mar 06 17:57:23 ood systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Mar 06 17:57:23 ood systemd[1]: apache2.service: Failed with result 'exit-code'.
Mar 06 17:57:23 ood systemd[1]: Failed to start apache2.service - The Apache HTTP Server.
Mar 06 17:57:23 ood sudo[1158728]: pam_unix(sudo:session): session closed for user root
Mar 06 17:57:25 ood systemd[1]: run-docker-runtime\x2drunc-moby-be2d7645148cee4c4ae08353fd2dd75146a4f93edd67778ad4e5ce98b772a10e-runc.sNfQtM.mount: Deactivated successfully.
Mar 06 17:57:26 ood sudo[1158770]: ubuntu : TTY=pts/5 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/journalctl
Mar 06 17:57:26 ood sudo[1158770]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000)
This appears to be the fatal error. Not sure why Ubuntu/Debian doesn’t create that directory for you (I’m fairly sure that’s the default directory that it would log to without any configurations).
Thanks for your help so far. I was able to get the login page to display correctly again; however, after logging in, I get a bad request error page. The journalctl output for ood doesn’t seem very helpful here. Is there another term you would suggest checking?
Mar 07 14:55:40 ood sudo[1227165]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000)
Mar 07 14:55:44 ood ondemand-dex[1227063]: time=2025-03-07T14:55:44.895Z level=INFO msg="login successful" connector_id=local username=admin preferred_username="" email=admin@test.com groups=[] request_id=60b809b3-a9cf-4b47-9405-b2d370c11ed0
Mar 07 14:55:52 ood kernel: [TTM] Buffer eviction failed
Mar 07 14:55:52 ood kernel: qxl 0000:00:01.0: object_init failed for (4096, 0x00000001)
Mar 07 14:55:52 ood kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
I left things open with journalctl -f and that error happens even when I’m not actively trying to connect on the page. I imagine its unrelated.
Is there a list/definition somewhere of which dex values need to be configured in the ood_portal.ymlfile? I’m thinking I’ve misconfigured something with the redirects.