CAS integration: (was 401 Unauthorized) now is: Index of /

SomePersonSomeWhereI · October 7, 2024, 7:11pm

On the server where OOD is installed yes this exists:

ls -l /home/mysuser/test_jobs/
total 0
[myuser@openondemand ~]$ ls -ld /home/myuser/test_jobs/
drwxr-xr-x 2 rk3199 domain users 6 Oct  7 11:18 /home/myuser/test_jobs/

Now that I switched to the production server I had to create that directory.

So there must be some confusion between where OOD is installed and the actual cluster login/submit node.

I get this error:

Job has status of completed

Output file from job does not exist: 

/home/myuser/test_jobs/output_ourcluster_2024_10_07t15_06_54_04_00_log
Test for 'ourcluster' FAILED!
Finished testing cluster 'ourcluster'

However the log file on production exists:
/home/myuser/test_jobs/output_ourcluster_2024_10_07t15_06_54_04_00_log

And its contents:
TEST A B C

Is there a misconfiguration?

jeff.ohrstrom · October 7, 2024, 7:32pm

The web node needs the same $HOME mount point that the cluster has. OnDemand uses the files in your $HOME to prep the job (on the web node side) and to react from the job.

For example the job has to write what host it’s on when the job is running. It writes this to a file in your $HOME that OnDemand (on the web node) reads so it knows where to proxy request to.

SomePersonSomeWhereI · October 7, 2024, 8:09pm

Got it ok I mounted /home. What would cause this error?

The cluster config for ourcluster has a problem: (<unknown>): did not find expected key while parsing a block mapping at line 2 column 1

Edit: now it’s:
The cluster config for **ourcluster** has a problem: (<unknown>): did not find expected key while parsing a block mapping at line 8 column 6

 1 ---
 2 v2:
 3    metadata:
 4	title: "Ourcluster"
 5    login:
 6	host: "ourcluster.ouruni.edu"
 7    job:
 8	adapter: "slurm"
 9	submit_host: "ourcluster.ouruni.edu"
10	ssh_hosts:
11        - ourcluster.ouruni.edu
12	cluster: Ourcluster

Is there an indentation problem?

jeff.ohrstrom · October 7, 2024, 8:27pm

Yes, it should have this format. I pull this directly from the documentation.

---
v2:
   metadata:
     title: "My Cluster"
   login:
     host: "my_cluster.my_center.edu"
   job:
     adapter: "slurm"
     cluster: "Ourcluster"
     conf: "/path/to/slurm.conf"
     submit_hosts:
       - ourcluster.ouruni.edu

SomePersonSomeWhereI · October 8, 2024, 1:51pm

I used a JSON tester and it validated what could be causing the error?

The cluster config for **ourcluster** has a problem: (<unknown>): did not find expected key while parsing a block mapping at line 8 column 6

---
v2:
   metadata:
     title: "Axon"
   login:
     host: "ourcluster.ouruni.edu"
   job:
     cluster: "Ourcluster"
     adapter: "slurm"
     conf: "/etc/slurm/slurm.conf"
     submit_hosts:
       - ourcluster.ouruni.edu
     bin: "/sbin" 
     bin_overrides:
       sbatch: "/usr/bin/sbatch"
       squeue: "/usr/bin/squeue"
       scontrol: "/usr/bin/scontrol"
       scancel: "/usr/bin/scancel"
     strict_host_checking: false
     copy_environment: false

Line 8 is the cluster: "Ourcluster" line

jeff.ohrstrom · October 8, 2024, 1:55pm

Maybe you need to restart your webserver (in the help menu) to pick up the new/valid configs?

SomePersonSomeWhereI · October 8, 2024, 1:59pm

Indeed that got me past the error. I thought systemctl restart httpd would take care of that.

How can I troubleshoot why Jobs are not displaying, No data available in table?

sudo su myuser -c 'scl enable ondemand -- bin/rake test:jobs:ourcluster RAILS_ENV=production'
Testing cluster 'ourcluster'...
Submitting job...
rake aborted!
Errno::ENOENT: No such file or directory - /usr/bin/sbatch

It’s not seeing sbatch on the login/submit node?

jeff.ohrstrom · October 8, 2024, 2:05pm

You can see in the /var/log/ondemand-nginx/$USER/error.log exact commands we issue (grep for execve or squeue or similar).

You can issue these same commands to replicate. Also note that in activejobs you may have some filter turned on like only show my jobs or only show my jobs on cluster X in which case you don’t actually have any jobs running.

SomePersonSomeWhereI · October 8, 2024, 2:08pm

OK why is it not finding these commands on the actual submit node?

App 829031 output: [2024-10-08 10:05:58 -0400 ] WARN "Error opening MOTD at \nException: bad URI(is not URI?): nil"

App 829031 output: [2024-10-08 10:06:10 -0400 ] ERROR "Errno::ENOENT: No such file or directory - /usr/bin/squeue\n/usr/share/ruby/open3.rb:222:in spawn'\n/usr/share/ruby/open3.rb:222:in popen_run'\n/usr/share/ruby/open3.rb:103:in popen3'\n/usr/share/ruby/open3.rb:290:in

edit: now seeing this error

Testing cluster 'ourcluster'...
Submitting job...
rake aborted!
OodCore::JobAdapterError: hostname contains invalid characters

What characters are invalid?

SomePersonSomeWhereI · October 8, 2024, 2:14pm

jeff.ohrstrom:

Yes, it should have this format. I pull this directly from the documentation.

---
v2:
   metadata:
     title: "My Cluster"
   login:
     host: "my_cluster.my_center.edu"
   job:
     adapter: "slurm"
     cluster: "Ourcluster"
     conf: "/path/to/slurm.conf"
     submit_hosts:
       - ourcluster.ouruni.edu

According to the docs here, the option is submit_host, but your example has submit_hosts which is correct?

jeff.ohrstrom · October 8, 2024, 2:27pm

Sorry, docs are always right. I copied it and started to restructure it to look more like yours.

SomePersonSomeWhereI · October 8, 2024, 2:35pm

Progress!

Got job id '3764893'
Job has status of queued
Job has status of completed
Test for 'ourcluster' PASSED!
Finished testing cluster 'ourcluster'

For this to work however I had to create a SSH key, i.e.,g ssh-keygen then ssh-copy-id -i ~/.ssh...
Otherwise I get this error:

OodCore::JobAdapterError: Warning: Permanently added 'ourcluster.ouruni.edu' (ED25519) to the list of known hosts.
myuser@ourcluster.ouruni.edu: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

Is there a better way to handle so we don’t have to tell all users to do the same?

jeff.ohrstrom · October 8, 2024, 2:38pm

Yes, sshd supports HostBasedAuthentication in which the servers themselves have key pairs not any given user.

SomePersonSomeWhereI · October 8, 2024, 7:05pm

So something like this. I’d be curious to see how others have done this so I’ll search around. We do use sssd but I’m not sure there’s a way to use that for this?

Edit I see a thread about using munge is this still an option? Or this wrapper for version 1.5?

SomePersonSomeWhereI · October 9, 2024, 1:06pm

jeff.ohrstrom do you know if this wrapper will still work in OOD 3.1?

jeff.ohrstrom · October 9, 2024, 1:13pm

Yes that copy_environment and job_envorionment work. At OSC we have slurm binaries on the webnode itself. But we also use HostBasedAuthentication so folks can ssh here and there easily.

system · April 7, 2025, 1:13pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Implementing CAS with OOD Get Help question	12	405	April 15, 2023
$Home Directory Doesn't Exist (CAS, SELinux, & University Controlled AD) Get Help ondemand2 , question	30	2176	November 21, 2022
Implementing authentication via CAS Get Help	19	5015	May 26, 2022
DOCS for setting up ood to use SAML or CAS? Feature Requests and Roadmap Discussion doc-request	5	1640	May 26, 2022
Apxs missing with the ood httpd implementation? Get Help	4	604	May 26, 2022

CAS integration: (was 401 Unauthorized) now is: Index of /

Related topics