OIDC+SSSD error on login

Hi!

I’m trying to get OOD working on our Ubuntu system, but running into an issue with OIDC auth.

It authenticates users properly, but when it tries to move past it, it results in the following error:

Error -- can't find group for 1543400513
Run 'nginx_stage --help' to see a full list of available command line options.

We’re using SSSD to sync groups from our AD server. 1543400513 is the “Domain Users” group that all users are a part of. How would I get around this issue? There’s no need for it to check that particular group as SSSD ignores it for the most part anyways.

Any help is appreciated.

Thanks,
Kurt

What are you requesting from the OIDC scope? I tend to request “profile” and “email”, email usually being what I map to in ldap.

profile, email, eduperson_entitlement, and eduperson_scoped_affiliation.

I don’t think there’s necessarily anything wrong with the OIDC implementation because it seems to be getting past that.

This may be fixed in 4.0. It seems like you need to rescue here in this block (pulled from this pull request - Allow UID to be returned by the mapper script. by guruevi · Pull Request #3795 · OSC/ondemand · GitHub though you don’t need all the changes in this PR).

The file should be /opt/ood/nginx_stage/lib/nginx_stage/user.rb. If you hot patch your system please take a backup of the original file.

Am I supposed to be running something other than update_nginx_stage to apply the user.rb change? After adding those lines, it still gives the same error

EDIT: One thing to note is that the 1543400513 group doesn’t have a group name, only the GID. Could that be the reason why it’s erroring out? It can’t grab the name?

You shouldn’t need to run anything - the program will run when you login.

I’m trying to figure out why we’re collecting the groups at all. I can’t see it being called anywhere in the codebase - so maybe we just stop making that call altogether?

Try commenting out that line here or assign it to an empty variable like @groups = [].

diff --git a/nginx_stage/lib/nginx_stage/user.rb b/nginx_stage/lib/nginx_stage/user.rb
index 6c0eb83d8..93f6ad271 100644
--- a/nginx_stage/lib/nginx_stage/user.rb
+++ b/nginx_stage/lib/nginx_stage/user.rb
@@ -63,7 +63,7 @@ module NginxStage
         end
       end
       @group = Etc.getgrgid gid
-      @groups = get_groups
+      # @groups = get_groups
     end
 
     # User's primary group name

That line doesn’t seem to be present in 3.1. Should I migrate to a nightly build of OOD?

It’s in a different spot in 3.1, but it’s still there. No I don’t think you should update to nightly because who knows what other errors you may run into.

Yea I think longer term (the next release) I’m just going to pull this group functionality.

It appears it was added here - to support validating groups when sharing apps./

But this was then removed at some point that I’m having trouble tracking down.

I still receive the same error after commenting out the get_groups line. The only thing that gets the error to change is if I comment out line 37, which results in this error:

Error -- undefined method `group' for #<NginxStage::User: <user info>

It does, however, only return the 1543400513 group instead of all the secondary groups.

:man_facepalming: 1543400513 is your primary group?

OK maybe try this diff - we we catch the error and just return the number for your primary group.

--- a/nginx_stage/lib/nginx_stage/user.rb
+++ b/nginx_stage/lib/nginx_stage/user.rb
@@ -34,8 +34,12 @@ module NginxStage
     # @raise [ArgumentError] if user or primary group doesn't exist on local system
     def initialize(user)
       @passwd = Etc.getpwnam user.to_s
-      @group = Etc.getgrgid gid
-      @groups = get_groups
+      begin
+        @group = Etc.getgrgid gid
+      rescue
+        @group = gid.to_s
+      end
+      # @groups = get_groups
 
       if name.to_s != user.to_s
         err_msg = <<~HEREDOC

I get “Error – undefined method `name’ for “1543400513”:String” after applying the diff

:man_facepalming: 1543400513 is your primary group?

Unfortunately, due to the way our AD servers are configured, everyone’s primary group is 1543400513. We work off of secondary groups for everything on our cluster.

@jeff.ohrstrom Is it possible that OOD is expecting all group names to return a string and it’s freaking out because the GID is returned instead?

OOD is expecting that @group variable to be an Etc::Group object. Maybe we can cast to an openstruct to fake it out.

Does a patch like this work? Looks like there’s also an API for mem but I don’t see any consumers of that API.

diff --git a/nginx_stage/lib/nginx_stage/user.rb b/nginx_stage/lib/nginx_stage/user.rb
index 0bef299f..f6d7238a 100644
--- a/nginx_stage/lib/nginx_stage/user.rb
+++ b/nginx_stage/lib/nginx_stage/user.rb
@@ -1,4 +1,5 @@
 require 'forwardable'
+require 'ostruct'
 
 module NginxStage
   # A String-like Class that includes helper methods to better describe the
@@ -34,8 +35,12 @@ module NginxStage
     # @raise [ArgumentError] if user or primary group doesn't exist on local system
     def initialize(user)
       @passwd = Etc.getpwnam user.to_s
-      @group = Etc.getgrgid gid
-      @groups = get_groups
+      begin
+        @group = Etc.getgrgid gid
+      rescue
+        @group = OpenStruct.new(:name => gid.to_s, :to_s => gid.to_s)
+      end
+      # @groups = get_groups
 
       if name.to_s != user.to_s
         err_msg = <<~HEREDOC

Got past that error! A new one though:

Error -- nginx: [emerg] getgrnam("1543400513") failed in /var/lib/ondemand-nginx/config/puns/kstine.conf:1

Looks like it’s still checking the name itself somewhere.

OK well that’s coming from nginx, so there’s nothing I can do there.

When you ssh into a login node, what does your id look like? Is your primary group still 1543400513?

I feel like there’s some SSSD/AD magic config we need here to replace that group. @tdockendorf do you have any ideas here?

Yes, the default group for everyone is the “Domain Users” acccount, which SSSD maps as 1543400513. We can’t change that since AD sets it upstream by default.

For example, the output of id for my user is:

uid=851707146(kstine) gid=1543400513 groups=1543400513,851716347(cluster-users),851717015(cluster-testing)

Every user is a part of the cluster-users group and that’s what we check to determine access to the cluster itself.

Yea I’m a little bit at a loss now. Googling around doesn’t turn much up. Are you running winbind by any chance?

We aren’t. It’s the standard SSSD-AD+RealmD domain joining, not samba+winbind.

We’re mapping SIDs to UIDs and GIDs using ldap_id_mapping because our AD instance doesn’t provide UIDs and GIDs, but that’s the only thing (somewhat) out of the ordinary.