I have a new install of OOD v4.0.8 which I am installing from scratch on a RHEL9.6 – which I am performing strictly following the instructions at Installation — Open OnDemand 4.0.0 documentation (besides authentication, which is done with LDAP/Dex in the same way I did for an older version which works just fine).
The authentication seems to be working (wrong user/passwd are rejected, good ones are let in), however upon successful login I get
Looking at /var/log/httpd/<URL redacted>_error_ssl.log I see the following, which does have a few strange things
[Tue Nov 25 17:07:42.553615 2025] [auth_openidc:warn] [pid 1123068:tid 1123244] [client 10.1.1.254:59584] oidc_check_x_forwarded_hdr: header X-Forwarded-Proto received but OIDCXForwardedHeaders not configured for it, referer: https://<URL redacted>:8443/dex/auth/ldap/login?back=&state=guac4ceuxybfmwhu4coapxr7t
[Tue Nov 25 17:07:42.553866 2025] [proxy:error] [pid 1123068:tid 1123244] (2)No such file or directory: AH02454: http: attempt to connect to Unix domain socket /var/run/ondemand-nginx/davide/passenger.sock (localhost:80) failed
[Tue Nov 25 17:07:42.553885 2025] [proxy_http:error] [pid 1123068:tid 1123244] [client 10.1.1.254:59584] AH01114: HTTP: failed to make connection to backend: localhost, referer: https://<URL redacted>:8443/dex/auth/ldap/login?back=&state=guac4ceuxybfmwhu4coapxr7t
[Tue Nov 25 17:07:42.554821 2025] [lua:info] [pid 1123068:tid 1123244] [client 10.1.1.254:59584] allowed_hosts="<URL redacted>" req_is_https="true" req_status="503" req_hostname="<URL redacted>" res_content_location="" local_user="davide" req_port="8443" req_referer="<URL redacted>:8443/dex/auth/ldap/login" req_method="GET" time_user_map="1.997" res_content_disp="" remote_user="Davide" req_origin="" res_content_type="text/html; charset=iso-8859-1" log_time="2025-11-25T23:07:42.554730.0Z" req_accept_language="en-us,en;q=0.5" req_content_type="" req_filename="proxy:http://localhost/pun/sys/dashboard" req_user_agent="Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" time_proxy="1.182" res_location="" req_uri="/pun/sys/dashboard" res_content_encoding="" res_content_length="299" req_server_name="<URL redacted>" res_content_language="" req_cache_control="" log_hook="ood" req_is_websocket="false" log_id="aSY2vkzTsf3yG2FKqZmhpgAAAJc" req_accept="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" req_handler="proxy:unix:/var/run/ondemand-nginx/davide/passenger.sock|http://localhost" req_accept_charset="" req_accept_encoding="gzip, deflate, br, zstd" req_protocol="HTTP/1.1" req_user_ip="10.1.1.254", referer: <URL redacted>:8443/dex/auth/ldap/login?back=&state=guac4ceuxybfmwhu4coapxr7t
namely:
it uses localhost rather than the actual URL in a few places
has my username with an uppercase (but I suspect that is an ActiveDirectory-ism and I have seen it in other contexts too where it did not cause issues there)
tries to open a socket against port 80 which fails (if I point my browser to that URL that gets redirected to port 8443 and is not found either)
/var/run/ondemand-nginx/ exists but does not have any subdirectory under it
running the following (which I saw in /var/log/secure is happening at that time) completes without message and exit with code 0
[root ~] # tail -3 /var/log/secure
Jan 8 21:41:11 hostname sudo[3727779]: apache : PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u davide -a https%3a%2f%2fURL_REDACTED%3a8443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri
Jan 8 21:41:11 hostname sudo[3727779]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=48)
Jan 8 21:41:11 hostname sudo[3727779]: pam_unix(sudo:session): session closed for user root
[root ~] # /opt/ood/nginx_stage/sbin/nginx_stage pun -u davide -a https%3a%2f%2fURL_REDACTED%3a8443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri
[root ~] # echo $?
0
[root ~] #
None of these change a thing, I always get that Service Unavailable message in the browser and that extremely annoying No such file or directory error in the logs, and most frustratingly no clue why!
Reading about similar problem online such as this or this other suggests that some service may be not installed or not configured correctly or something. However I cannot find anything relevant missing (those reports are for PHP which obviously is not applicable here), unless there is something missing in the OnDemand documentation.
For example what should create that /var/run/ondemand-nginx/davide/passenger.sock, and when? Not the php-fpm service obviously. I can’t find any information about that. Does anybody here know? Is there a way that I could crank the verbosity of the logs to the maximum to get any clue why that socket is not created?
Sorry for the long delay. Booting the PUN should create that domain socket you’ve listed there. If the file isn’t there, then I’d guess the PUN doesn’t boot.
I’d check if you have selinux running and any messages in /var/log/audit that prohibit nginx from starting up. Nginx should be creating this file to listen on.
Instead, I can tell that I did not edit anything in /etc/ood/config/nginx_stage.yml which is basically all commented out. Could that be the root cause of the problem? If not, how do I check if PUN did boot, and how do I attempt to force it booting, so that I can see whatever problem it encountered?
[root@node30 ~]# ps aux | grep pun
root 3781997 0.0 0.0 3876 1792 pts/0 S+ 10:28 0:00 grep --color=auto pun
[root@node30 ~]# ps aux | grep -i pun
root 3782015 0.0 0.0 3876 1792 pts/0 S+ 10:28 0:00 grep --color=auto -i pun
[root@node30 ~]# ps aux | grep -i nginx
root 3782019 0.0 0.0 3876 1792 pts/0 S+ 10:29 0:00 grep --color=auto -i nginx
[root@node30 ~]# ll /var/log/ondemand-nginx/
total 0
I’m totally lost here: the last time I installed OnDemand was v1.8, IIRC, and many things have changed. So I am following the instructions at Installation — Open OnDemand 4.0.0 documentation and I think I followed them diligently.
I have the services for httpd and ondemand-dex started and they are running. Their logs report:
running pun as I wrote above (with that nginx_stage command) and
correctly find the users in LDAP (in fact I can login with the correct AD username and password but not connect if I intentionally mispell either or both)
As I wrote before, if I run /opt/ood/nginx_stage/sbin/nginx_stage pun -u davide -a https%3a%2f%2fURL_REDACTED%3a8443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri it does absolutely nothing and exit immediately with exit code 0 (and leaves nothing running behind)
OK, having nothing in /var/log/ondemand-nginx is very strange indeed.
Now I guess we have to go lower in dmesg or journalctl to see if there’s anything that the Linux kernel itself would be reporting on. I’m thinking maybe something like out of memory errors or similar. Like the process boots up correctly (becuase the exit status is 0) but then is quickly terminated by the kernel for some reason.
[root@node30 ~]# free -h
total used free shared buff/cache available
Mem: 503Gi 7.0Gi 384Gi 284Mi 115Gi 496Gi
Swap: 9.3Gi 0B 9.3Gi
No traces of OOMs anywhere in the kernel logs, and nothing timestamped as today in dmesg
Very light logging in messages which seems irrelevant to me.
[root@node30 ~]# tail -30 /var/log/messages
Jan 9 09:57:47 node30 systemd[1]: session-c105.scope: Deactivated successfully.
Jan 9 10:00:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:00:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:00:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 10:10:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:10:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:10:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 10:20:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:20:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:20:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 10:30:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:30:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:30:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 10:35:54 node30 systemd[1]: Starting dnf makecache...
Jan 9 10:35:54 node30 dnf[3782497]: Updating Subscription Management repositories.
Jan 9 10:35:55 node30 dnf[3782497]: Metadata cache refreshed recently.
Jan 9 10:35:55 node30 systemd[1]: dnf-makecache.service: Deactivated successfully.
Jan 9 10:35:55 node30 systemd[1]: Finished dnf makecache.
Jan 9 10:40:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:40:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:40:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 10:50:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 10:50:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 10:50:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 11:00:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 11:00:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 11:00:02 node30 systemd[1]: Finished system activity accounting tool.
Jan 9 11:10:02 node30 systemd[1]: Starting system activity accounting tool...
Jan 9 11:10:02 node30 systemd[1]: sysstat-collect.service: Deactivated successfully.
Jan 9 11:10:02 node30 systemd[1]: Finished system activity accounting tool.
On the other hand journalctl does provide some info. Most of that is regular things such as ondemand-dex[3726393]: time=2026-01-09T08:59:23.120-07:00 level=INFO msg="performing ldap search some other is the same pun command I saw and pasted earlier (twice already, so I’m not copy-pasting that part again) with that PAM session opened and closed not even one second apart.
which appears to be repeated every couple of hours.
Now since also this seems to be a dead end, I opened that nginx_stage file and here it is!!
#!/usr/bin/env bash
# The purpose of this script is to wrap up the necessary environment for the
# per-user NGINX (PUN) processes to run under. The PUN requires the ondemand
# Software Collection.
# Root directory for this library
#
ROOT_DIR="$(dirname "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")")"
# Source in the default environment
# shellcheck source=/etc/profile
source "${ROOT_DIR}/etc/profile"
# Allow admin to override the environment the PUN runs in
#
NGINX_PROFILE=${NGINX_PROFILE:-/etc/ood/profile}
if [[ -f "${NGINX_PROFILE}" ]]; then
# Source in the custom environment
# shellcheck source=nginx_stage/etc/profile
source "${NGINX_PROFILE}"
fi
# Environment is set, so call Ruby now
#
#exec \
# /usr/bin/env ruby \
# -I"${ROOT_DIR}/lib" \
# -rnginx_stage \
# -e "NginxStage::Application.start" \
# -- "${@}"
This appears to be to be doing nothing? Is this how it is supposed to be? Why is the invocation to Ruby commented out? I did not commented out myself, I simply installed ondemand from the repos…