On the server where OOD is installed yes this exists:
ls -l /home/mysuser/test_jobs/
total 0
[myuser@openondemand ~]$ ls -ld /home/myuser/test_jobs/
drwxr-xr-x 2 rk3199 domain users 6 Oct 7 11:18 /home/myuser/test_jobs/
Now that I switched to the production server I had to create that directory.
So there must be some confusion between where OOD is installed and the actual cluster login/submit node.
I get this error:
Job has status of completed
Output file from job does not exist:
/home/myuser/test_jobs/output_ourcluster_2024_10_07t15_06_54_04_00_log
Test for 'ourcluster' FAILED!
Finished testing cluster 'ourcluster'
However the log file on production exists: /home/myuser/test_jobs/output_ourcluster_2024_10_07t15_06_54_04_00_log
The web node needs the same $HOME mount point that the cluster has. OnDemand uses the files in your $HOME to prep the job (on the web node side) and to react from the job.
For example the job has to write what host it’s on when the job is running. It writes this to a file in your $HOME that OnDemand (on the web node) reads so it knows where to proxy request to.
Got it ok I mounted /home. What would cause this error?
The cluster config for ourcluster has a problem: (<unknown>): did not find expected key while parsing a block mapping at line 2 column 1
Edit: now it’s: The cluster config for **ourcluster** has a problem: (<unknown>): did not find expected key while parsing a block mapping at line 8 column 6
You can see in the /var/log/ondemand-nginx/$USER/error.log exact commands we issue (grep for execve or squeue or similar).
You can issue these same commands to replicate. Also note that in activejobs you may have some filter turned on like only show my jobs or only show my jobs on cluster X in which case you don’t actually have any jobs running.
Got job id '3764893'
Job has status of queued
Job has status of completed
Test for 'ourcluster' PASSED!
Finished testing cluster 'ourcluster'
For this to work however I had to create a SSH key, i.e.,g ssh-keygen then ssh-copy-id -i ~/.ssh...
Otherwise I get this error:
OodCore::JobAdapterError: Warning: Permanently added 'ourcluster.ouruni.edu' (ED25519) to the list of known hosts.
myuser@ourcluster.ouruni.edu: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Is there a better way to handle so we don’t have to tell all users to do the same?
So something like this. I’d be curious to see how others have done this so I’ll search around. We do use sssd but I’m not sure there’s a way to use that for this?
Yes that copy_environment and job_envorionment work. At OSC we have slurm binaries on the webnode itself. But we also use HostBasedAuthentication so folks can ssh here and there easily.