Issues with Bundler and Passenger when Updating Open OnDemand 3.0

Hello,

I am currently experiencing issues with Bundler and Passenger on my OOD installation in Ubuntu. I’ve run into a series of errors that I’ve been unable to resolve.

Firstly, when I try to run bundle install with the --path flag, I receive the following deprecation warning and error:

[DEPRECATED] The `--path` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set path 'vendor/bundle'`, and stop using this flag 
There was an error while trying to write to `/var/www/ood/apps/sys/dashboard/.bundle/config`. It is likely that you need to grant write permissions for that path.

I attempted to run bundle config set --local path ‘vendor/bundle’ as recommended by the deprecation message, but it didn’t seem to resolve the issue.

Additionally, I’m encountering an error with Passenger when accessing the OOD dashboard:

[ E 2024-04-24 14:02:27.7078 1252177/T2i age/Cor/App/Implementation.cpp:221 ]: Could not spawn process for application /var/www/ood/apps/sys/dashboard: The application process exited prematurely.
Error ID: 7b0d623b
Error details saved to: /tmp/passenger-error-tGCXDG.html

As a side note, I did install OOD v.3.0 at CentOs 7.9, OOD was succesfuly setup there, although on CentOS, I was unable to install this version of Slurm and it had no communication with the computing nodes, I’ve tried using ssh tunnels to connect it, but I could not figure out a way to connect each user to it’s slurm user in Slurm v.23.02, which was not ideal for Jupyterlab

Therefore OOD v.3.0 is now being installed at the master node, which is using ubuntu 22.04 libraries (such as the following ruby repo: http://archive.ubuntu.com/ubuntu/pool/main/r/ruby3.0/libruby3.0_3.0.2-7ubuntu2.4_amd64.deb). I have ran passenger-config validate-install and it validated passenger and apache as correct installations

I’m looking for guidance on resolving these issues, particularly:

1.- The correct way to connect the CentOs to the computing nodes without installing slurm.
2.-The correct way to configure Bundler’s installation path in the context of OOD.
3.- Understanding the invalid request error I am seeing and the implications of Passenger spawning error related to the missing racc gem.

Any help or suggestions from the community would be greatly appreciated!

Hi and welcome!

  1. I’m having trouble parsing the phrase “connect each user to it’s slurm user” - when they ssh from the OOD node to a login node, are they not the same user? You should be able to configure the cluster.yml file to use a submit_host where we’ll ssh to that host to run all the various sbatch & squeue commands. I’d assume they’re the same user on both sides of the SSH connection.
  2. You shouldn’t need to do any of that, and I would indeed discourage running bundle install on our packages.
  3. Can you give me the error here - you may have found a bug. To confirm this happens on Ubuntu 22.04? I’ll try to replicate this and backport a fix if required.

Hello and thank you for the feedback,

Let me clarify the challenge we are facing with OOD by dividing the discussion into two parts: the CentOs setup and the Ubuntu setup. Note that both setups are using v3.0.

CentOs Setup: Although the OOD setup was successful, our primary issue arises with maintaining user identity across the two systems when executing Slurm commands.

  1. In our current configuration, all users logged into OOD on CentOS are funneled through a single user account on the Ubuntu Slurm server (an account called oodadmin). This is facilitated by SSH keys set up for passwordless access.

While this allows us to remotely execute Slurm commands, it does not preserve the individual user’s identity when interacting with the Slurm server. Instead, every job submitted through OOD appears under the oodadmin account on Ubuntu. This connection was using a static account and id-rsa key.

deally, each user would access the system with their own username, as the email authentication using dex is already functioning correctly in both setups

As for the option submit_host, I was not able to find it listed as per the cluster configuration for slurm, I am now attempting to change it to match the Linux Host cluster configuration. I have created the following settings file:

# /etc/ood/config/clusters.d/clusterbio.yml
---
v2:
  metadata:
    title: "clusterbio"
    url: "https://ood.vhio.net" # Same url as the one serving ood.
    hidden: false
  login:
    host: "172.27.0.66" # HPC master node
  job:
    adapter: "linux_host" # The host uses SLURM internally
    submit_host: "172.27.0.66"  # Setting submit_host as suggested
    ssh_hosts:  # Actual login nodes
      - "172.27.0.35"
      - "172.27.0.225"
      - "172.27.0.184"
      - "172.27.0.117"
    site_timeout: 7200
    debug: true
    singularity_bin: "/usr/bin/singularity" # Verified the path is correct 
    singularity_bindpath: "/etc,/media,/mnt,/opt,/run,/srv,/usr,/var,/users"
    # The singularity image has been created at CentOs 7.9 and moved to Ubuntu
    singularity_image: "/opt/ood/linuxhost_adapter/centos_7.sif" # Located at login host
    strict_host_checking: false
    tmux_bin: "/usr/bin/tmux" # Verified the path is correct at the login host
  acls:
    - adapter: "group"
      groups:
        - "gsg_bioinformatica"
      type: "allowlist"
  batch_connect:
    basic:
      script_wrapper: "module restore\n%s"
      set_host: "host=$(hostname -A | awk '{print $2}')"


Ubuntu setup: Using Ubuntu 22.04, the user mapping would have been correct, since it’s using the same OS as the computing nodes; however, the installation failed

2.- Thanks for pointing out the issue with bundler. I was attempting to follow the steps from a message on a resolved issue that had a similar error where the gems were not found. I have a similar symptom, the issue is installed but not found. I am attaching a screenshot for reference:

3.- As for the missing gems warning, as shown in the previous screenshot, that would also happen for other parts of the application, for instance the error I am getting at the web browser would also show it.

The previous screenshot shows the message displayed on the webpage after login (the login works correctly with dex), indicating an error 404 (request not found). I did validate the gems were installed, also executed gem pristine as suggest and the passenger validation was correct, which was validated with the apache configuration as well using passenger-config validate-install :

However, the issue still persists, and I am seeking further insights or recommendations that might help us resolve these ongoing challenges with user identity management and installation errors.

Please let me know if there is anything else I can do to fix the OOD setup. Many thanks!

For reference, I am adding a summary the logs I am receiving with this new configuration file when I enter the webpage in CentOs. Extracted from (oodadmin being my username used for the login and 172.27.0.0 the HPC master node) at /var/log/ondemand-nginx/{oodadmin}/error.log

   {"method": "GET", "path": "/pun/sys/dashboard/", "status": 200, "duration": 14.60, "view": 6.76},
    {"method": "GET", "path": "/pun/sys/dashboard/apps/icon/jupyter/sys/sys", "status": 200, "duration": 6.32, "view": 0.00},
    {"method": "GET", "path": "/pun/sys/dashboard/apps/show/activejobs", "status": 302, "location": "https://ood.vhio.net/pun/sys/dashboard/activejobs", "duration": 4.46, "view": 0.00},
    {"method": "GET", "path": "/pun/sys/dashboard/activejobs", "status": 200, "duration": 9.98, "view": 6.73},
    {"method": "EXECVE", "command": "ssh", "options": ["-t", "-p", "22", "-o", "BatchMode=yes", "-o", "UserKnownHostsFile=/dev/null", "-o", "StrictHostKeyChecking=no"], "host": "oodadmin@172.27.0.0", "tmux_command": "list-panes", "output_format": "#session_name,#session_created,#pane_pid"},
    **{"**error**": "OodCore::JobAdapterError", "message": "Pseudo-terminal will not be allocated because stdin is not a terminal. Permission denied (publickey,password).", "status": "Error", "location": "/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.3-1/gems/ood_core-0.23.5/lib/ood_core/job/adapters/linux_host.rb:99"},**
    {"method": "GET", "path": "/pun/sys/dashboard/activejobs.json", "status": 200, "duration": 94.81, "view": 0.00}

It seems to return an error when it’s attempting to connect via ssh to the SLURM node. I could be able to add the id-rsa key for this to be “passwordless”, however for each user it would require to retrieve as many public keys, which is not ideal.

Thanks in advance!

OK - on the CentOS side - you want HostBasedAuthentication. This will allow users to ssh here and there without keys (because the hosts have keys). You still have to manage a few keys - but much fewer.

On the Ubuntu side - I’m not quite sure what’s going on, but I’m sure you shouldn’t be running passenger-config validate-install - maybe that’s what’s throwing this off? There’s no need to run it and indeed it may be messing something up if it’s editing the URLs like that. Did you get that from our documentation? It shouldn’t be telling you to run that.

Which is to say on the Ubuntu system - I’d try reinstalling then not running passenger-config validate-install and see how that works out of the box.

Regarding the Ubuntu Passenger installation, I didn’t immediately validate the installation. I did that when I got the error and I decided to investigate the error message “invalid request: /sys/%3Cwbr%3E//www.phusionpassenger.com”.

Afterwards, I ensured that Passenger was installed correctly by uninstalling and installing again following the steps outlined in the Passenger documentation, particularly:

Step 3: check installation

After installation, please validate the install by running sudo /usr/bin/passenger-config validate-install.

This basically confirms that both Apache and Passenger were installed correctly.

This led me to consider that the error might be related to how the request is constructed, given the unusual parsing observed with “/sys/%3Cwbr%3E//www.phusionpassenger.com”. Perhaps there’s an issue with parsing, causing the unexpected behavior.

Moving forward, I’ll explore the suggested steps for CentOS regarding Host-Based Authentication.

Thank you for your response!

I have no idea why it’d be redirecting you like that. Can you share your ood_portal.yml and nginx_stage.yml configurations? You can obfuscate anything you need.

But again, I’m not really sure how you can get into this state. Indeed /usr/bin/passenger-config validate-install. shouldn’t even exist - we install libraries in a different location /opt/ood.

My guess is the dependencies are all out of whack. Like you installed the generic passenger from Ubuntu universe and not the one we distribute. That’s just about my only guess.

I’ve been consistently using the same configuration for both CentOS and Ubuntu, with only minor differences in paths. Here are the configurations I’ve been using:

---
#
# # /etc/ood/config/ood_portal.yml

title: "{HPC PORTAL}"
logo: "{Logo_path}"

servername: '{ORG}'

# Default: null (no SSL support)
ssl:
  - 'SSLCertificateFile "{PATH}.crt"'
  - 'SSLCertificateKeyFile "{PATH}.key"'
  - 'SSLCertificateChainFile "{PATH}.crt"'

# Reverse proxy configuration
host_regex: '{NODE}(2|3|4)?\.{ORG}\.org'
node_uri: '/node'
rnode_uri: '/rnode'

# OIDC remote user claim. This is the claim that populates REMOTE_USER
# Example:
oidc_remote_user_claim: email

dex:
  ssl: true
  https_port: "5554"

  # Docs: https://github.com/dexidp/dex/blob/master/Documentation/connectors/ldap.md
  connectors:
  - type: ldap
    name: ActiveDirectory
    id: ad
    config:
      host: {SERVER:PORT}

      insecureNoSSL: true
      insecureSkipVerify: true

      bindDN: cn={ADMIN_USER},cn=Users,dc={ORG},dc={ORG}
      bindPW: {ADMIN_PASS}
      usernamePrompt: Email Address

      userSearch:
        baseDN: dc={ORG},dc=org
        filter: "(objectClass=person)"
        username: userPrincipalName
        idAttr: distinguishedName
        emailAttr: userPrincipalName
        nameAttr: cn

      groupSearch:
        baseDN: cn={GROUP},dc={ORG},dc={ORG}
        filter: "(objectClass=group)"
        userMatchers:
        - userAttr: distinguishedName
          groupAttr: member
        nameAttr: cn

And as for the nginx_stage.yml:

---
# /etc/ood/config/nginx_stage.yml

# Root directory where apps are installed for each type
app_root:
  dev: '~%{owner}/ondemand/dev/%{name}'  # User development apps in home directory
  usr: '/var/www/ood/apps/usr/%{owner}/gateway/%{name}'  # User-specific apps
  sys: '/var/www/ood/apps/sys/%{name}'  # System-wide apps

# SSL configuration for HTTPS
ssl:
  - 'SSLCertificateFile "{PATH}.crt"'
  - 'SSLCertificateKeyFile "{PATH}.key"'
  - 'SSLCertificateChainFile "{PATH}.crt"'

# Custom environment variables for the PUN
pun_custom_env:
  OOD_DASHBOARD_TITLE: "Title"
  OOD_BRAND_BG_COLOR: "#53565a"
  OOD_BRAND_LINK_ACTIVE_BG_COLOR: "#fff"

# Default settings for various paths used by nginx_stage
template_root: '/opt/ood/nginx_stage/templates'  # NGINX config templates
proxy_user: 'apache'  # User that the NGINX proxy runs as, apache for ubuntu
nginx_bin: '/opt/ood/ondemand/root/usr/sbin/nginx'  # Verified path exists

Currently, I’m in the process of implementing Host-Based Authentication on CentOS, utilizing the configurations provided earlier along with the /etc/ood/config/clusters.d/clusterbio.yml file.

If there’s any mistake detected or if you have any suggestions, please let me know.

Can you check your journalctl logs for something like this when the PUN starts up.

Apr 30 18:17:34 976bbd4f38bd sudo[280]:   apache : PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u jeff -a http%3a%2f%2flocalhost%3a8080%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

There doesn’t appear to be anything wrong with those configs - but my heart is still set on somehow the configurations messing this up.

Yea that %3Cwbr%3E is <wbr> - a line break suggesting what you’ve said - some parsing error. Again, I don’t see anything wrong with the configs, but I would ask that you comment anything that isn’t strictly necessary. You have a few configs that are the defaults that you don’t need, because they’re the default value.

I searched our configs for that hostname phusionpassenger and can’t find it anywhere in our source code - Code search results · GitHub

Meaning, somehow that’s coming from your configs and/or your system. I could be wrong here of course, but I don’t see how the distribution (our source code) could come up with that URL.

Looking into the Ubuntu issue more - I’m finding that invalid request: is in a code path that I’m not quite sure how you arrived to.

Which is to say - what actions did you take to get into that state? What’s the URL you were trying to access and what’s the URL of the page you’re seeing the errors on?

It seems that we’re trying to initialize an application. But an out of the box install shouldn’t need to initialize any application - the ones we distribute are already initialized. This code path comes into play when you are developing other fullstack applications within ondemand - which is why I ask about navigation and how you arrived at that page.

I suspect that the “invalid request” message stems from an interpretation of the 404 response code, caused by an erroneous redirection to a non-existent location.

This issue arises immediately after a successful authentication with DEX, being roadblock in accessing the webpage post-authentication. I’ll add some screenshots step by step to clarify this:

  1. When accessing the Ubuntu URL for OOD, ood.{MyOrg}.net (Placeholder for my actual URL). I’m redirected to the login page:

ood.{MyOrg}.net/dex/auth/ad/login?back=&state=m5dsmp2cesps5xek4r2md2bbj

  1. After entering login credentials and successfully authenticating, I’m redirected to:

https://ood.{MyOrg}.net/pun/sys/dashboard

  1. Expanding the link at “Technical details for the administrator of this website” reveals:

  2. Clicking on “Phusion Passenger(R)” leads to an unexpected URL:

https://ood.{MyOrg}.net/pun/sys/<wbr>//www.phusionpassenger.com

On the previous code I can already see a strange < wbr > at the url.

  1. Further, clicking “Initialize App” redirects to:

https://ood.{MyOrg}.net/nginx/init?redir=%2fpun%2fsys%2f%253Cwbr%253E%2f%2fwww.phusionpassenger.com

Resulting in the discussed error message for “Invalid request”:

Please let me know if there are any mistakes or if further clarification is needed.

Got it, thanks! Yea that Initialize App link won’t work. I’m surprised that links from the page on item 3 link back to OnDemand somehow…

In any case - #3 is where you stop. Again, I’m surprised those links link back to your OnDemand instance somehow, but that’s where a lot of my confusion came from.

OK - So stopping at #3 and not clicking any links you need to find the tmp HTML file it generated (/tmp/passenger-error-tGCXDG.html from your original comment) and look at it. It should detail what errors you may have.

OK - sorry for the run around. The whole phusionpassenger.com thing was a red herring. I am now suspecting that this is the error. I will try to replicate this today and see what I find.

I just tried to replicate in a container and couldn’t. Here’s my relevant system info. You can see I can find racc 1.7.3 which is located in /opt/ood/gems. Issuing gem environment shows that /opt/ood/gems is in my gem path.

root@7bc929fbda56:/# source /opt/ood/ondemand/enable 
root@7bc929fbda56:/# ruby --version
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
root@7bc929fbda56:/# which ruby
/usr/bin/ruby
root@7bc929fbda56:/# gem list racc

*** LOCAL GEMS ***

racc (1.7.3, default: 1.5.1)
root@7bc929fbda56:/# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

I can replicate by just removing that gem (rm -rf). If I look at my /tmp passenger error file

Navigate to these tabs

I can see /opt/ood/gems in my GEM_PATH, which is why it worked for me out of the box.

Thinking about this more - I dug up how nginx_stage (/opt/ood/nginx_stage/sbin/nginx_stage) sets the environment.

It’s through this file.

root@f99cdebe3a3b:/# cat /opt/ood/nginx_stage/etc/profile      
# For Software Collections 2.0
#
# 1. Read in environment variable SCL_PKGS which may be set in `sudo` call
#    otherwise fallback to default software collection packages.
#
# 2. Check if Software Collections is installed, then source the defined
#    package scripts.
#
SCL_PKGS=${SCL_PKGS:-"ondemand"}
SCL_SOURCE="$(command -v scl_source)"
DEB_SOURCE="/opt/ood/ondemand/enable"
if [[ "${SCL_SOURCE}" ]]; then
  source "${SCL_SOURCE}" enable ${SCL_PKGS}
else
  [[ -e "${DEB_SOURCE}" ]] && source "${DEB_SOURCE}" || :
fi

I wonder if you issue these commands in a CLI what you’ get. That is, if the GEM_PATH is being set correctly and it can find the racc gem.

Can it be as simple as Ubuntu using sh instead of bash and so the source command doesn’t exist?

root@7bc929fbda56:/# source /opt/ood/ondemand/enable
root@f99cdebe3a3b:/# echo $GEM_PATH
/opt/ood/gems:
root@7bc929fbda56:/# gem list racc

I also wonder if you have the the $NGINX_PROFILE environment variable set or /etc/ood/profile the file. You’ll find in the nginx_stage script we source that file. I wonder if there’s something in that file that’s throwing this off.

NGINX_PROFILE=${NGINX_PROFILE:-/etc/ood/profile}

As an aside - that href we’re so confused about is actually in the Passenger’s error page, so that’s very odd.

Which I found this issue for - WBRs in URL · Issue #2446 · phusion/passenger · GitHub that the developer closed saying it was duplicate (of what? I can’t say). So hopefully, that get’s fixed at some point upstream.

Moving forward, Following your recommendation on CentOs I enabled Host-Based Authentication and successfully connected without a password. However, upon accessing the “Active jobs” webpage would not be loaded.

Here’s what I found in the OOD logs when accessing the “Active jobs” webpage:

OodCore::JobAdapterError: Pseudo-terminal will not be allocated because stdin is not a terminal.
error connecting to /tmp/tmux-1406410919/default (No such file or directory)

Then after reviewing these logs, I have made a test with ssh from the CentOs to the ubuntu server which hosts slurm. Tmux was working fine there, I was able to create a session, list it and then delete it.

Previously, I had the Slurm adapter instead of the Linux adapter (I’ve listed the cluster configuration on a previous post). I suspect there might be a configuration issue causing this error.

Any help is greatly appreciated!

I’ll have to replicate to see. I see you have debug: true set so you should be able to find shell scripts that we use to submit. You can try to use those to replicate.

You can look into the /var/log/ondemand-nginx/$USER/error.log for execve lines. These lines are the actual commands we’re issuing.

I just checked and we use the -t flag when we ssh. reading online however, it appears you can use multiple -t, like -t -t - though I’m not 100% what the difference is. Though it’s no out of the question that you may have found a bug.

If you have a Slurm cluster - I’d suggest connecting to it. My guess is the active jobs not having any data is the same issue - some oddity in the linux host adapter. Slurm should be much more stable than the linux host adapter.

That said - I’ll try to replicate and report back.

Though I would like to hear about the Ubuntu system and if you have the same GEM_PATH and other comments I’d had around the source command. We don’t release for Centos 7 anymore, so Ubuntu is your longer term solution here.