Hello everyone,
I am currently trying to setup open ondemand to test it out and I encountered an issue while configuring it.
I am using ondemand-4.0.0-1.el8.x86_64 rpm package on rocky 8-10.
I have SSSD and munge configured and running. open ondemand is linked to a keycloak instance via OIDC.
selinux is in permissive mode.
sudo su - tdelmas works from the host openondemand is running on.
users HOME folders are mounted in /home/users/ (mount -t lustre)
When I log in on on open ondemand, I get a home directory not setup error message.
Now the weird thing is that if from the node hosting open ondemand I do a
sudo su - tdelmas
and then go back to the webpage and click restart webserver I get to the dashboard, I see my files, can upload some even and launch jobs against our cluster.
I am a bit at a loss of how/where I should look for additional logging informations.
I can reliably trigger this issue (doing an nginx_clean -f, trying to log back, getting the no home directory error, impersonating myself again, etc…).
I have a feeling it might live somewhere between nginx_stage/sssd/pam but I can’t figure it out just yet.
Any help appreciated, documentation pointer I should read through again,
Thanks in advance,
Thibault.
copying slack message:
Hi, thanks for the reply, unfortunately the logs are not giving any errors. I even modified lua_log_level:debug
in httpd i mainly see things like:
[Tue Feb 25 15:57:23.492911 2025] [lua:debug] [pid 27099:tid 140467534538496] lua_request.c(1850): [client 10.2.0.62:45918] AH01486: request_rec->dispat
and nothing looking like an error.
Only 200 returns in tdelmas/access_log
and passenger shutdown/startup in error logs.
Hello and welcome! Sorry for the issue.
a first step to see some type of errors may be to try and run the PUN from the CLI before you login to see if any errors around home directories or SSSD do show up.
In order to do this you can consult the nginx_stage
docs here:
https://osc.github.io/ood-documentation/latest/reference/commands/nginx-stage/usage.html
Try and just create the configs and start the process from the Cli with the following command:
sudo nginx_stage pun --user 'some_user'
I would also tail the logs to see if anything jumps out while trying this:
tail -f /var/log/ondemand-nginx/<user>/error.log
Sure:
[root@ ~]# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -f
[root@ ~]# /opt/ood/nginx_stage/sbin/nginx_stage pun --user tdelmas
[root@ ~]# tail -f /var/log/ondemand-nginx/tdelmas/error.log
[ N 2025-02-26 09:30:56.3304 31908/T1 age/Wat/WatchdogMain.cpp:1370 ]: Starting Passenger watchdog...
[ N 2025-02-26 09:30:56.3560 31912/T1 age/Cor/CoreMain.cpp:1341 ]: Starting Passenger core...
[ N 2025-02-26 09:30:56.3562 31912/T1 age/Cor/CoreMain.cpp:257 ]: Passenger core running in multi-application mode.
[ N 2025-02-26 09:30:56.4799 31912/T1 age/Cor/CoreMain.cpp:1016 ]: Passenger core online, PID 31912
as I mentioned I haven’t found many options to render it more verbose
strace /opt/ood/nginx_stage/sbin/nginx_stage pun --user tdelmas 2>&1 | grep home
stat("/home/users/tdelmas", 0x7ffc13445740) = -1 EACCES (Permission denied)
Running strace It appears the nginx process itself tries to stat the users directory. But we are running root_squash on our (lustre) file system. Could that be related?
Thanks for doing that. I’m working off these docs here to understand this better: lustre_manual_markdown/03.18-Managing Security in a Lustre File System.md at master · echofoo/lustre_manual_markdown · GitHub
Looking at the output and reading this more the issue is likely going to be root_squash
as NGINX is running as root to accomplish its tasks, and as such it looks to be hitting root_squash
and failing.
I’m not totally sure of a work-around here. @tdockendorf do you have any experience with this type of issue or insights?
OSC also runs root_squash on our NFS home, so I don’t think that’s completely the issue. It’s likely the permissions of /home/users
if I had to guess. At OSC we have things like this:
$ ls -la /users
total 592
<SNIP>
drwxr-xr-x 9 root root 4096 Nov 28 2023 sysp
Then inside directories like sysp
is the actual home which is by default 0700 but because the parent is accessible to root, it can at least stat the subdirectory.
Example under sysp
:
drwxr-x--- 154 djohnson PZS0708 73728 Feb 25 09:14 djohnson
$ sudo -u root stat /users/sysp/djohnson
File: /users/sysp/djohnson
<SNIP>
However going deeper won’t work:
$ sudo -u root ls -la /users/sysp/djohnson/
ls: cannot open directory '/users/sysp/djohnson/': Permission denied
Thank you very much for helping out. Turns out indeed, since nginx tries as root to stat the user home folder we need nobody (the root squashed user) to have at least read exec on the parent directory (chmod XX5 /home/users) so the stat works.