Users with non-traditional GID cannot launch interactive apps

Hi folks,

Recently, some users reported that they are seeing the following message when they try to launch an interactive app:

sending incremental file list
rsync: chgrp "/home/tfabrizi/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/hpc/output/ae87fced-2182-4dfa-b033-c27350a277c7/." failed: Operation not permitted (1)
rsync: chgrp "/home/tfabrizi/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/hpc/output/ae87fced-2182-4dfa-b033-c27350a277c7/desktops" failed: Operation not permitted (1)
./
desktops/
desktops/gnome.sh
desktops/kde.sh
desktops/mate.sh
desktops/xfce.sh

sent 4,143 bytes  received 460 bytes  9,206.00 bytes/sec
total size is 3,776  speedup is 0.82
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]

The users who reported this error all have legacy accounts, and thus, their GID is non-traditional (the vast majority of our new users have a default GID which works with interactive apps, but the older users have legacy GIDs that are not the default).

It seems to me like there is a call to rsync -a or rsync -g when the user hits launch, and I believe that this rsync call is responsible for copying the application from /var/www/ood/apps/sys/app_xyz to the user’s working dir in their home directory (~/ondemand/data/sys/dashboard/batch_connect/...).

I can replicate the error with a test account whose GID is testgid when I try to rsync -g /var/www/ood/apps/sys/app_xyz ~/ or rsync -a /var/www/ood/apps/sys/app_xyz ~/. That same call succeeds if I use my regular user account, with GID default.

It is almost impossible to switch over all our legacy users to the new default GID.

Can I understand why the -g or -a flag is passed to rsync? Is there any way I can remove this flag without breaking the system? Are there any other suggestions to handle multiple GIDs?

Thank you all!
Walid

Hello and sorry for the issue! I have a few questions up front. What does it mean for one to have a “legacy GID” and what is the form of the legacy compared to a normal GID? Are you on OOD 3.1 or a different version? What OS are you on? Lastly, the logs show the desktop app as being used, but is this across any app they launch? Sorry for all that, but it may help me understand better.

I just used the man page for rsync for the easy part.

For the g flag:

       --group, -g
              This  option  causes  rsync  to set the group of the destination
              file to be the same as the source file.  If the  receiving  pro‐
              gram  is  not  running  as  the super-user (or if --no-super was
              specified), only groups that the invoking user on the  receiving
              side is a member of will be preserved.  Without this option, the
              group is set to the default group of the invoking  user  on  the
              receiving side.

              The  preservation  of  group information will associate matching
              names by default, but may fall back to using the  ID  number  in
              some circumstances (see also the --numeric-ids option for a full
              discussion).

So it’s to preserve the group permissions.

For the a flag:

       --archive, -a
              This is equivalent to -rlptgoD.  It is a quick way of saying you
              want recursion and want to preserve almost everything.  Be aware
              that it does not include  preserving  ACLs  (-A),  xattrs  (-X),
              atimes  (-U),  crtimes  (-N),  nor the finding and preserving of
              hardlinks (-H).

              The only exception to the above equivalence is when --files-from
              is specified, in which case -r is not implied.

This makes sure we get recursive descent of what is in the directory being copied.

What is the code you are currently looking to have brought that question though, because I couldn’t find the -g or -a flags being used.

I think the code you are referring to is in the staging period, where there is an rsync -rlpv call:

Hi Travis,

Sorry for the confusing initial post. This is an extremely confusing issue to articulate. :smile:

About legacy GID: Some 4 years ago, we started creating accounts with GID “Domain Users”. The directory /var/www/ood/apps/ is owned by svcopenondemand:"Domain Users". Domain Users is the new default GID. When I say legacy GID, I mean that some users GID is somePInamegrp, and not Domain Users. These are the users facing the issue.

I am currently on OOD 3.0.3. I am running RHEL 8.8.

The example I sent is with desktop, but all other apps cause the same issue.

I did not find the rsync line being invoked. I only replicated the error when I ran rsync -g or rsync -a manually. I believe the code line you sent is the correct one. The issue might be coming from the -p flag, given it’s preserving permissions.

Haha, no worries. Everyone has a little uniqueness to their systems. Thank you for all the info on the versions and behavior.

I think based on how these different groups are working you are likely right that the p flag is what may be causing an issue. It looks like a chgrp command is called on the session output directory which holds all the output and shell files. The p flag causes the receiving rsync to set the destination permissions to be the same as the source permissions. What happens if you use the legacy GID without the p in your tests? I’m wondering if rsync is trying to issue that chgrp because the directory we output to has a different group permission than where it came from because those legacy users likely aren’t the group permissions being copied from, is that true?

I don’t have a good solution looking at this off the top of my head though, but i’m wondering if something with App Sharing could be a solution:
https://osc.github.io/ood-documentation/latest/how-tos/app-development/app-sharing.html?highlight=groups

Otherwise we might need some check there about groups and alter how rsync is called based on that possibly.

3.1 fixed this specifically with chgrp issues, the PR that fixed it even mentions chgrp. I’d suggest upgrading to 3.1 so you don’t have to edit your own files.

That said - if you do want to edit the files (which I would discourage), this is the file you’d need to edit in /var/www/ood/apps/dashboard.

Ahh!! That makes sense. I will upgrade the OnDemand next week.

Looking at dashboard/app/models/batch_connect, looks like version 3.0.3 used the rsync -a flag which invokes group permissions. Here’s what I have:

    # Stage the app's job template to user's staging directory
    # @param root [#to_s] root directory that gets staged
    # @param context [Object] context available when rendering staged files
    # @return [Boolean] whether staged successfully
    def stage(root, context: nil)
      staged_root.tap { |p| p.mkpath unless p.exist? }

      # Sync the template files over
      oe, s = Open3.capture2e('rsync', '-av', '--exclude', '*.erb', "#{root}/", staged_root.to_s)
      raise oe unless s.success?

      # Output user submitted context attributes for debugging purposes
      user_defined_context_file.write(JSON.pretty_generate context.as_json)

      # Render all template files using ERB
      render_erb_files(
        template_files(root),
        root_dir: root,
        binding: TemplateBinding.new(self, context).get_binding
      )
      true
    rescue => e   # rescue from all standard exceptions (app never crashes)
      errors.add(:stage, e.message)
      Rails.logger.error("ERROR: #{e.class} - #{e.message}")
      false

I changed the -av to -rlpv on our test instance for testing, but I will make sure to upgrade as soon as next week.

Thank you both for the support. You are always wonderful and informative. Thanks for this great platform. It’s quite frankly revolutionizing HPC. :smile: