Hello,
Running 2.0 of Open OnDemand and trying to enable interactive Desktop. We are running HPC cluster with Sun Grid Engine. However, the documentation doesn’t provide instructions on how to set up the submission config file for SGE. Wondering if anyone has been succesful in implementing this with SGE?
At the bottom of the page, it states that if try to launch it will fail miserably as first will need to set up submission parameters. That is where the documentation doesn’t provide an example for SGE.
in /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.20/gems/ood_core-0.18.1/lib/ood_core/job/adapters/sge/helper.rb because I’m moving to slurm anyway and I needed to have proof-of-concept fast.
The language I guess is a little wonky. the submission parameters you’re looking for are infact the batch_connect portion of the sge.yml that you’ve provided.
What’s the actual behaviour and/or error you’re seeing?
Jeff,
Thank you for the link on the invalid job name. However, I’m still getting the error when trying to start an interactive session.
Failed to submit session with the following error:
qsub: ERROR! argument to -N option must not contain /
If this job failed to submit because of an invalid job name please ask your administrator to configure OnDemand to set the environment variable OOD_JOB_NAME_ILLEGAL_CHARS.
The HPC Desktop session data for this session can be accessed under the staged root directory.
Thanks for that. I got past the -N error and it has submitted my request! However, my submitted job is stuck as pending… There are no other jobs running in the test.q. The error is not helpful at all:
(-l h_rt=3600) cannot run in queue “test.q” because of cluster queue
It doesn’t show a script being executed. Anyway tried the following via cli specifying the job_script_content.sh file to run. It again put the job in as pending and never changed:
job_script_content.sh as you’ve indicated is the entrypoint. I think we pass it to stdin to a lot of scheduler.
At this point it’s a SGE thing, which I’m not super familiar with that system. As you can see, it even has the same behavior through a shell. Meaning, when you remove OOD from the equation, you still have this issue.
It doesn’t seem like you’re able to sumbit to that queue under any circumstance. I’d figure out how to get this working with just the CLI, then tackle what’s going on with OOD.