Apache's sudo call to start PUN times out

buzh · February 17, 2023, 8:56am

Hello from rainy Oslo, Norway! First of all, thank you for this amazing project! I’ve been setting up OnDemand for our local cluster and it’s been going well - but I’ve hit one problem that I’m having a hard time figuring out.

When a user logs in “cold” (no PUN running), there is always a ~30 second delay.

Long story short, it’s caused by something systemd-related timing out trying to sudo:

Feb 17 09:10:19 ood-dev01.educloud.no sudo[417843]:   apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u ec-buzh -a https%3a%2f%2food.educloud.no%3a443%2fnginx%2finit
Feb 17 09:10:20 ood-dev01.educloud.no systemd[1]: Started Session c1 of user root.
Feb 17 09:10:45 ood-dev01.educloud.no sudo[417843]: pam_systemd(sudo:session): Failed to create session: Connection timed out
Feb 17 09:10:45 ood-dev01.educloud.no sudo[417843]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 17 09:10:45 ood-dev01.educloud.no sudo[417843]: pam_unix(sudo:session): session closed for user root
Feb 17 09:10:46 ood-dev01.educloud.no httpd[65310]: oida + exec
Feb 17 09:10:46 ood-dev01.educloud.no httpd[65310]: + exec

As you can see, there is a 25 second delay from “Started Session” until the pam_systemd(sudo:session) timeout. Once the PUN is running, everything is fine - it’s just that sudo call that delays things.

The “oida + exec” is there because I added some debug output to pun_proxy.lua trying to pin down where it happened:

    err = nginx_stage.pun(r, pun_stage_cmd, user, app_init_url, pun_pre_hook_exports, pun_pre_hook_root_cmd)
    if err then
      r.usleep(1000000) -- sleep for 1 second before trying again
      print("oida" .. " " .. err)
    end

I was initially thinking it happened in the lua script, but it turns out it must be something in systemd-logind or something related. It looks very similar to this old systemd issue:

github.com/systemd/systemd

"pam_systemd(login:session): Failed to create session: Connection timed out" on boot (pam_systemd should use a much longer timeout for the OpenSession bus call, since it starts the --user systemd instance which might block for a long time)

opened 12:26AM - 18 Mar 16 UTC

closed 12:48PM - 24 Oct 17 UTC

Sohalt

login

### Submission type - [x] Bug report - [ ] Request for enhancement (RFE) ### sys…temd version the issue has been seen with > 229 ### Used distribution > ArchLinux 64bit ### In case of bug report: Expected behaviour you didn't see > normal execution of systemd-logind (setup of $XDG_SESSION, $XDG_VTNR, etc) ### In case of bug report: Unexpected behaviour you saw > login on the tty takes unexpectedly long (I don't use a dm) and then drops me into the tty instead of starting Xorg. This is because $XDG_VTNR is not set and I start Xorg with `[[ -z $DISPLAY && $XDG_VTNR -eq 1 ]] && startx ]]`. The journal says `login[608]: pam_systemd(login:session): Failed to create session: Connection timed out` directly before `login[608]: LOGIN ON tty1 BY sohalt`. > When I switch to tty2 and log in there $XDG_VTNR is correctly set to 2. Switching back to tty1 logging out and logging in again also sets the correct variables on tty1. ### In case of bug report: Steps to reproduce the problem > The problem only occurs when I enable a user service that starts Emacs in daemon mode and then don't log in immediately after the login prompt appears (i.e. wait ~10sec). When I disable the Emacs unit I can login without problems even after waiting some time.

However, it’s not exactly the same. Also it seems nobody else on the internet ever had that happen with sudo:session (or at least that’s what Google says).

I tried adjusting dbus so that the timeouts that were around 25 seconds were shorter, hoping that it would help me further isolate the problem, but without luck - the sudo is still delayed by 25s every time.

Any ideas about what this could be?

I’m running RHEL 8.7 in vmware with OnDemand 2.0.29-1.

Best,
Andreas at the University of Oslo dept of Scientific Computing Services.

jeff.ohrstrom · February 17, 2023, 3:02pm

Hi and welcome!

Thank you for the detailed description.

I’m seeing the same results from google. That said, since it is a sudo:session I’m wondering if you can replicate with any of these commands?

sudo ls
sudo -u root ls

# or switch into root and issue sudo
sudo su - root 
sudo ls

It would seem the latter set of commands would really replicate. That is, the former are

pam_unix(sudo:session): session opened for user root by johrstrom(uid=0)

I.e., We need a sudo session opened for user root by user root, not the unprivileged user.

If that doesn’t replicate directly, here’s a suggestion on adding debugging in the OnDemand stack. Note that you should reset this all after you’re done. The idea here is to build some sort of wrapper script and execute that instead of nginx_stage directly.

Here’s the setting you need to modify.

pun_stage_cmd: /some/wrapper/that/can/strace

With a strace script that will strace sudo.

#!/bin/bash
# contents of  /some/wrapper/that/can/strace

strace -o /tmp/sudo_strace.out sudo /opt/ood/nginx_stage/sbin/nginx_stage $@

Note that you’ll have to modify your sudo rules to allow for this. Be sure to reset after you’ve completed debugging. Indeed you seem to be in the cloud, so I’d say re-roll the entire machine if you can.

https://osc.github.io/ood-documentation/latest/reference/files/ood-portal-yml.html

buzh · February 20, 2023, 12:09pm

Thanks for the suggestions, Jeff.

Strace logs didn’t help me much, but I eventually figured out what was going on: selinux!

The denials were filtered out by the default settings on EL8, but after doing semodule -DB it turned out that httpd_t was denied the net_admin capability. I just confirmed that allow httpd_t self:capability net_admin removes the delay I was seeing.

I’ll be setting up another OOD instance from scratch in a week or two, and I’ll check to see if the same problems happens, and if I can I will update this issue so others can benefit. For now I basically did:

semodule -DB
# log into a fresh session here
audit2allow -a -M ood_sudo
semodule -i ood_sudo.pp
semodule -B

jeff.ohrstrom · February 20, 2023, 2:24pm

of course it’s something simple. Glad to hear you got it figured out.

system · August 19, 2023, 2:24pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Httpd24-httpd.service start-pre operation timed out. Terminating Get Help	5	578	October 8, 2023
"pun_pre_hook_root_cmd" execution error Get Help	11	116	January 20, 2025
Error -- Permission denied @ dir_s_mkdir - /var/run/ondemand-nginx Get Help	4	1603	May 26, 2022
Active User on OnDemand Interface Get Help	3	615	May 26, 2022
Frequency of SLURM calls from OOD Get Help	4	154	September 10, 2024

Apache's sudo call to start PUN times out

Related topics