Hi, I see on the website that there are now installation instructions for Rocky Linux 9 and its mentioned as a supported version but the repo for rocky9 doesn’t exist, e.g.
$ yum list
Open OnDemand Web Repo 864 B/s | 196 B 00:00
Errors during downloading metadata for repository ‘ondemand-web’:
OK, I took the plunge and moved to Rocky9 so I’m keen to try open ondemand here and I’d be happy to beta or alpha test for you if you like. I have a small hpc cluster and I’m currently using enginframe nice for hpc desktops.
The system seems to be completely ignoring the submit script. Is this the correct way to customize this for this version? I thought the the node_type would be more appropriate for this and I am looking some advice the best way to set this up. I couldn’t find a node_type example with slurm. I want to make the default node type for the bc_desktop a gpu node.
I see you have the right file permissions in one of your comments.
I wonder if cluster here is case sensitive. This cluster is the filename of the cluster definition in /etc/ood/config/clusters.d, so the filename should be HI_cluster.yml. Though again, I don’t know if it’s case sensitive, we always use lowercase across the board.
OK - the default submit.yml provides almost nothing.
What’s the output you see in /var/log/ondemand-nginx/$USER/error.log? You should be able to search for sbatch and see execve lines. These are the commands we’re actually issuing when we submit the job.
The "--gres=gpu:1" line in native could be conflicting with gpus_per_node. It seems that the gpus_per_node configuration uses the '--gpus-per-node flag, so there could be some conflict there.
In any case, check your logs for what command + args are being issued and if that’s what you’d expect.