CAS integration: (was 401 Unauthorized) now is: Index of /

We’re on RHEL 9 doing a fresh install, SELinux disabled, with:

ondemand-runtime-3.1.5-1.el9.x86_64
ondemand-nginx-1.24.0-1.p6.0.20.ood3.1.5.el9.x86_64
ondemand-gems-3.1.7-1-3.1.7-1.el9.x86_64
ondemand-nodejs-3.1.5-1.el9.x86_64
ondemand-ruby-3.1.5-1.el9.x86_64
ondemand-passenger-6.0.20-1.ood3.1.5.el9.x86_64
ondemand-apache-3.1.5-1.el9.x86_64
ondemand-3.1.7-1.el9.x86_64

And following the CAS instructions, we’re getting a 401 Unauthorized:
https://openondemand.ourdomain.edu/?ticket=ST-xxx

In httpd.conf we have:

<Location />
                Authtype CAS
                Require valid-user
</Location>
Include conf.modules.d/*.conf
LoadModule auth_cas_module /usr/lib64/httpd/modules/mod_auth_cas.so

Based on the linked article, we also set (with the correct domain):

# mkdir /var/cache/httpd/mod_auth_cas
# chown apache:apache /var/cache/httpd/mod_auth_cas

# vi /etc/httpd/conf.d/auth_cas.conf
LoadModule auth_cas_module modules/mod_auth_cas.so
CASCookiePath /var/cache/httpd/mod_auth_cas/
CASCertificatePath /etc/pki/tls/certs/ca-bundle.crt
CASLoginURL https://sso.yourdomain/cas/login
CASValidateURL https://sso.yourdomain/cas/serviceValidate


# vi /var/www/html/.htaccess
AuthType CAS
Require valid-user

Running:

source /opt/ood/ondemand/enable
bin/rake -T test:jobs
rake test:jobs       # Test all clusters
rake test:jobs:ourcluster  # Test the cluster: ourcluster
ake test:jobs:ourcluster
Skipping 'ourcluster' as it doesn't allow job submission.

Looks similiar to this old issue.

What other steps are needed? Our CAS config options have this:

ticket validation:

  casServerUrlPrefix:
    https://cas.ouruni.edu/cas/ (production)

  CAS2 protocol:
    validation path:         serviceValidate
    ticketParameterName:     ticket
    serviceParameterName:    service

  CAS3 protocol:
    validation path:         /p3/serviceValidate
    ticketParameterName:     ticket
    serviceParameterName:    service 

  SAML 1.1 protocol:
    validation path:         samlValidate
    artifactParameterName:   SAMLArt
    serviceParameterName:    TARGET
    redirectAfterValidation: true

  WIND protocol (deprecated):
    validation path:         validate
    ticketParameterName:     ticketid
    serviceParameterName:    destination

/etc/ood/config/ood_portal.yml has:

auth:
  - 'AuthType CAS'
#  - 'Require group ood'
#  - 'AuthGroupFile /sw/hprc/local/etc/ood/ood.cas'
  - 'RequestHeader edit* Cookie "(^MOD_AUTH_CAS[^;]*(;\s*)?|;\s*MOD_AUTH_CAS[^;]*)" ""'
  - 'RequestHeader unset Cookie "expr=-z %{req:Cookie}"'
  - 'CASScope /'
logout_redirect: 'https://cas.ourdomain.edu/cas/logout'

Update, I had the incorrect value for CASValidateURL, which I’ve fixed but now I get:

# Index of /

|![[ICO]](https://openondemand.ourdomain.edu/icons/blank.gif)|[Name](https://openondemand.ourdomain.edu/?C=N;O=D)|[Last modified](https://openondemand.ourdomain.edu/?C=M;O=A)|[Size](https://openondemand.ourdomain.edu/?C=S;O=A)|[Description](https://openondemand.ourdomain.edu/?C=D;O=A)|
| --- | --- | --- | --- | --- |
|---|
|---|

Update on the test job:

rake test:jobs:ourcluster --trace
** Invoke test:jobs:ourcluster (first_time)
** Invoke environment (first_time)
** Execute environment
** Invoke /root/test_jobs (first_time, not_needed)
** Execute test:jobs:ourcluster
Testing cluster 'ourcluster'...
Submitting job...
rake aborted!
OodCore::JobAdapterError: No ED25519 host key is known for ourcluster.ouruni.edu and you have requested strict checking.
Host key verification failed.
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/ood_core-0.25.0/lib/ood_core/job/adapters/slurm.rb:530:in `rescue in submit'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/ood_core-0.25.0/lib/ood_core/job/adapters/slurm.rb:468:in `submit'
/var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:31:in `block (4 levels) in <top (required)>'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `block in execute'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `each'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `execute'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:199:in `synchronize'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:199:in `invoke_with_call_chain'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:188:in `invoke'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:182:in `invoke_task'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `block (2 levels) in top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `each'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `block in top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:147:in `run_with_threads'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:132:in `top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:83:in `block in run'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:208:in `standard_exception_handling'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:80:in `run'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/exe/rake:27:in `<top (required)>'
/bin/rake:25:in `load'
/bin/rake:25:in `<main>'

Caused by:
OodCore::Job::Adapters::Slurm::Batch::Error: No ED25519 host key is known for ourcluster.rc.zi.columbia.edu and you have requested strict checking.
Host key verification failed.
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/ood_core-0.25.0/lib/ood_core/job/adapters/slurm.rb:387:in `call'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/ood_core-0.25.0/lib/ood_core/job/adapters/slurm.rb:266:in `submit_string'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/ood_core-0.25.0/lib/ood_core/job/adapters/slurm.rb:528:in `submit'
/var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:31:in `block (4 levels) in <top (required)>'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `block in execute'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `each'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:281:in `execute'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:199:in `synchronize'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:199:in `invoke_with_call_chain'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/task.rb:188:in `invoke'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:182:in `invoke_task'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `block (2 levels) in top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `each'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:138:in `block in top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:147:in `run_with_threads'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:132:in `top_level'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:83:in `block in run'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:208:in `standard_exception_handling'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/lib/rake/application.rb:80:in `run'
/opt/ood/ondemand/root/usr/share/gems/3.1/ondemand/3.1.7-1/gems/rake-13.1.0/exe/rake:27:in `<top (required)>'
/bin/rake:25:in `load'
/bin/rake:25:in `<main>'
Tasks: TOP => test:jobs:ourcluster

Note I do have munge running with the same munge.key file and permissions from the cluster.

This last error appears to indicate that you can’t ssh into the submit_host to submit a job.

What this could be I have no idea. We’re not familiar with CAS, so unlikely to help in this regard. If you had another deployment with CAS, shouldn’t it just use the same configs?

Indeed the host OOD is installed on is not a node in the cluster. Are there instructions that point to this scenario and what’s needed? I test this with root. Can you clarify what’s needed for all users here?

No we have not used CAS with OOD yet this is a first attempt.

You appear to be sshing here - so what you need is an SSH key. We use HostBasedAuthentication at OSC - I’m guessing you need to generate a key pair for this machine and distribute it’s public keys to the other machines.

As for the CAS - I’m not sure. What docs we have were contributed by external users so we don’t know much, if anything, about CAS authentication.

Well this is from:
sudo su myuser -c "scl enable ondemand -- bin/rake test:jobs:cluster RAILS_ENV=production --trace"

After adding my SSH key to the login server and running the test under my user:

Job has status of running
Job has status of completed
Output file from job does not exist: /home/myuser/test_jobs/output_ourcluster_2024_10_07t11_40_37_04_00_log
Test for 'ourcluster' FAILED!
Finished testing cluster 'ourcluster'

I’m pretty sure I have the CAS settings correct. However what is the default landing page URL? I see:

 ls -l /var/www
total 0
drwxr-xr-x. 2 root root  6 Aug  6 09:10 cgi-bin
drwxr-xr-x. 2 root root 23 Sep 27 16:10 html
drwxr-xr-x. 6 root root 64 Sep 25 12:57 ood

in httpd.conf:
DocumentRoot "/var/www/html"

going to the / in a browser still lands on a blank page:

 Index of /
[ICO]	Name	Last modified	Size	Description

Screenshot 2024-10-07 at 11.47.39 AM

You should see an ood-portal.conf in /etc/httpd/conf.d. Do you see this file? If you do you may need to modify the httpd.conf that was shipped with your distribution. For example removing that DocumentRoot.

Indeed I do:

ls -l /etc/httpd/conf.d
total 80
-rw-r--r--. 1 root root   2916 Aug  6 09:09 autoindex.conf
-rw-r--r--  1 root root    482 Oct  4 17:31 cas.conf
-rw-r-----  1 root apache 5224 Oct  4 17:44 ood-portal.conf
-rw-r--r--. 1 root root    400 Aug  6 09:10 README
-rw-r--r--. 1 root root   8869 Sep 30 13:29 ssl.conf
-rw-r--r--. 1 root root   1252 Aug  6 09:07 userdir.conf
-rw-r--r--. 1 root root    664 Sep 25 13:43 welcome.conf

Great what should it change to? Or comment it out?

Remove it or comment it, then bounce httpd.

Now I get:

# Forbidden

You don't have permission to access this resource.

Should there be anything in the URL path after the FQDN?

It should have redirected anything (or empty URLs) to /pun/sys/dashboard. Is that now the URL that you see in the browser? I.e., is it redirecting you?

When it redirects you, it should prompt or redirect for CAS authentication.

Yes now I get prompted but I just get this:

# Not Found

The requested URL was not found on this server.

Here is what’s in the cas.conf file (omitting the other CAS values):

<Location />
            	Authtype CAS
                Require valid-user
</Location>
LoadModule auth_cas_module /usr/lib64/httpd/modules/mod_auth_cas.so

<Directory "/var/www/ood/public">
    <IfModule mod_auth_cas.c>
        AuthType CAS
    </IfModule>

    Require valid-user
</Directory>
Include conf.modules.d/*.conf

I suspect that first Location directive overrides the OnDemand one. I’m not entirely sure you need another cas.conf. What have you supplied in the ood_portal.yml as your auth?

Seems like this is what you’d need:

auth:
  - LoadModule auth_cas_module /usr/lib64/httpd/modules/mod_auth_cas.so
  - AuthType CAS
  - Require valid-user

Again. I don’t think you need the extra cas.conf and you almost certainly don’t need the first Location directive.

I don’t know how the last Include may or may not impact the virtual host.

Yes this is what I have in ood_portal.yml

Is this error expected?
AH00534: httpd: Configuration error: No MPM loaded
I don’t see this recommended any where in the docs.

OK getting closer. Here was the fix for the MPM error:

Resolution
For httpd 2.4, the mpm is typically loaded by default in conf.modules.d/00-mpm.conf. That should be included in httpd.conf with the following:
Raw
Include conf.modules.d/*.conf

Now I’m getting:

AH01796: AuthType CAS configured without corresponding module

But at least now I get the redirect to ~/pun/sys/dashboard albeit with a 503 ISE.

Where would I put these?

#  - 'CASCookiePath /var/cache/httpd/mod_auth_cas/'
#  - 'CASLoginURL https://cas.ouruni.edu/cas/login'
#  - 'CASValidateURL https://cas.ourun

Edit:
I just edited the cas.conf file to be 4 lines:

LoadModule auth_cas_module /usr/lib64/httpd/modules/mod_auth_cas.so
'CASCookiePath /var/cache/httpd/mod_auth_cas/'
'CASLoginURL https://cas.ouruni.edu/cas/login'
'CASValidateURL https://cas.ourun

In the auth section of your ood_portal.yml.

No that kept giving me an error that CAS was not configured. I edited my post. It’s working now. Just want to see what happens when I put this on the production server.

Screenshot 2024-10-07 at 1.58.01 PM

OK so in our production server, it appears that under Files → Home Directory it’s showing /home within the Open OnDemand server. I have to go to Cluster → Shell Access and then login as mu user.

Screenshot 2024-10-07 at 2.17.44 PM

However there are no jobs showing:

I put these in ourcluster.yml file:

     bin_overrides:
        sbatch: "/usr/bin/sbatch"
        squeue: "/usr/bin/squeue"
        scontrol: "/usr/bin/scontrol"
        scancel: "/usr/bin/scancel"

Running this suggested test:
sudo su myuser -c "scl enable ondemand -- bin/rake test:jobs:ourcluster RAILS_ENV=production --trace"

Fails with:

Output file from job does not exist: /home/myuser/test_jobs/output_ourcluster_2024_10_07t14_27_57_04_00_log
Test for 'ourcluster' FAILED!
Finished testing cluster 'ourcluster'

Here is the Slurm node job log:

[2024-10-07T14:28:27.294] task_p_slurmd_batch_request: 3764785
[2024-10-07T14:28:27.294] task/affinity: job 3764785 CPU input mask for node: 0x0000000000000C
[2024-10-07T14:28:27.294] task/affinity: job 3764785 CPU final HW mask for node: 0x00000020000002
[2024-10-07T14:28:27.295] _run_prolog: prolog with lock for job 3764785 ran for 0 seconds
[2024-10-07T14:28:27.870] Launching batch job 3764785 for UID 1822857372
[2024-10-07T14:28:27.909] [3764785.batch] task/cgroup: /slurm/uid_1822857372/job_3764785: alloc=14336MB mem.limit=14336MB memsw.limit=14336MB
[2024-10-07T14:28:27.915] [3764785.batch] task/cgroup: /slurm/uid_1822857372/job_3764785/step_batch: alloc=14336MB mem.limit=14336MB memsw.limit=14336MB
[2024-10-07T14:28:27.942] [3764785.batch] error: Could not open stdout file /home/myuser/test_jobs/output_ourcluster_2024_10_07t14_27_57_04_00_log: No such file or directory
[2024-10-07T14:28:27.942] [3764785.batch] error: IO setup failed: No such file or directory
[2024-10-07T14:28:27.951] [3764785.batch] _oom_event_monitor: oom-kill event count: 1
[2024-10-07T14:28:27.963] [3764785.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 256
[2024-10-07T14:28:27.966] [3764785.batch] done with job

Seems like there’s some issue with this directory. Surprised that Slurm can’t just make that directory, but is ~/test_jobs an actual directory?