User does not exist - Re occuring for some users

Hi Folks,

Down to one final issue preventing is from going live. Sometimes when a valid user logs on to OOD, they get the following message instead of their OOD home page. It never happens to me, seems like some users are impacted repeatedly.

Error – user does not exist: fredfoo
Run nginx_stage --help to see the full.list of command options.

I seem to be able to temporarily fix this by running the following from the log

Jul 16 17:21:22 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Jul 16 17:21:23 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Jul 16 17:21:24 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Running this fixes it, at least for a shot time:

/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

We are using mod_authnz_pam, pwauth, sssd to IPA and finally AD which is our source directory service up stream.

Happens with all browsers

  1. Can someone pls help me understand where in the login process things may be failing and where to look?

  2. Are there any known issues?

  3. Is there any option to increase relevant debugging

  4. These users can ssh into the login node that OOD runs on via SSH without issue.

  5. Stopping and starting the services does not fix the issue.

  6. CentOS 7.7 Ood version 1.6.22, dash v1.35.3

Thx

Unfortunately, I think that error bubbles up from nginx, though I can’t actually find that string anywhere. I tried to replicate and got something slightly different. If you check the file /var/lib/ondemand-nginx/config/puns/$USER.conf you’ll see something like user jeff 'jeff'; as the first line. /var/log/ondemand-nginx/$USER/error.log may have something in there. I believe this is thrown when nginx tries to start process’ as this user.

My guess is that you’ll see errors thrown from sssd or pam too in journalctl or /var/log/messages.

The fact that these users can ssh gives me pause and that you temporarily fix the issue even more so. We’ve seen ‘user not found’ type issues before but it’s generally because the LDAP queries are misconfigured. Our libraries just bubble up what errors we come across. And ‘not found’ is more typical of an LDAP where “doesn’t exist”, though similar, is from some other library.

I would suggest these questions for debugging: what’s the difference in authentication from SSH and OOD (mod_authnz_pam and pwauth are additional hops?). This always works for you, and sometimes for these other users. Could there be caching somewhere that’s failing? What system errors are being thrown (/var/log/httpd/, sssd, ipa, mod_authnz_pam)?

In fact, I can’t even replicate your issue, because if I try on a test instance I get can't find user for foo, which is sightly different. We were somehow able to find the user through getpwnam but failed at some other location. Maybe another argument for caching in some layer?

So far, nothing seems to crop up in the logs. I am wondering if we are intermittently loosing an Arg in the command line as the error refers is to run --help (see example in prev message)

regards,

Christopher Welsh
On his mobile.

Hi Jeff, wondering if I can show you or your team an example of it live for an affected user? Perhaps I could share my screen at a pre arranged time?

Yes we can meet, you can email me directly at johrstrom@osc.edu to set it up.

In the interim I’d still say look through all the logs in your auth stack. Not only for errors, but also just to rule the layers out as having some issue just to see what layer says what.

Make a list of what layer sees what. Clearly you’re able to authenticate through apache and apache believes they’re a real user.

It’s when we go to start processes as another user that we run into issues. You can run this command and it works. Presumably you run this as root. What happens when you’re apache. sudo -u apache sudo /opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri.

Maybe strace on this on this will tell us where it’s failing.

Hi Jeff and all,
Been a long time but finally getting back to this. Would you believe this only happens to a few people. Two of which are power users who cannot use OOD for this reason. Others are fine.

Anyway I have setup another test account and have dumped a trace file. Note that when I run this it works. If I then logout, close the browser and run “ ./nginx_stage nginx_clean” . The problem re-occurs. I have tried adding this test account as a local user on the ood host and still I get “ Error – user does not exist: fredfoo
Run nginx_stage --help to see the full.list of command options.”

but this works until a reboot or I guess a clean up runs.

strace -o /tmp/ood-keam sudo -u apache sudo /opt/ood/nginx_stage/sbin/nginx_stage pun -u skeam -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Please please anyone know where to look for this?

I would paste the strace but it’s quite a lot of lines as you could image. Is there a preferred way I could send it to you? Many Thx

EDIT: Clarification, I don’t even need to run the “ ./nginx_stage nginx_clean” command, simply logging out, closing the bowser, re-logging in gets me back to square one. Also note that the file “ /var/log/ondemand-nginx/skeam/error.log” has no entries in it so this issue must be early in the process. i.e. coz the user is not found according to OOD login. Thx

I just enabled uploading .txt files (along with yml and json). We have plenty of storage too (though it can’t be more than a couple kb?).

In any case, you can just upload it to this topic if you like. If not, maybe pastebin could work. Feel free to remove or obfuscate whatever you need.

Thx, Here is the attachment from:

strace -o /tmp/ood-keam sudo -u apache sudo /opt/ood/nginx_stage/sbin/nginx_stage pun -u skeam -a https%3a%2f%2ffredo-cluster%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

ood-keam.txt (113.5 KB)

Hi Guys,

Any insights?

Thanks for the attachment, still looking into it.

There’s something very funny going on here contextually grabbing these straces.

Initially I tried to replicate to get a strace log like yours and got something very different. Then saw you created this from the command line, so replicated that and I believe that’s wrong. What you have in the strace log is as far as I can tell, OK. You never hit ruby code in nginx_stage, indeed you never hit bash code in the sbin/nginx_stage bash script! And it exists 0 meaning it was a good exit.

I think you’re getting a strace of the sudo command itself and not a strace of /opt/ood/nginx_stage/sbin/nginx_stage command.

I think you should run the strace command as root directly instead of sudo -u apache. That is, su - root and then run strace.

Do a simple search for “ruby” in the output and you should see all sorts of hits. I think that’s the strace output we’re looking for.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

This should be resolved in 2.0. Or at least it gives you a better error message and tells you what you need to reconfigure in the SSSD stack.