Hi, I see on the website that there are now installation instructions for Rocky Linux 9 and its mentioned as a supported version but the repo for rocky9 doesn’t exist, e.g.
$ yum list
Open OnDemand Web Repo 864 B/s | 196 B 00:00
Errors during downloading metadata for repository ‘ondemand-web’:
If you look in the URL you are hitting the develop branch, which has some features documented that are not quite finished yet, such as the packaging for Rocky 9.
OK, I took the plunge and moved to Rocky9 so I’m keen to try open ondemand here and I’d be happy to beta or alpha test for you if you like. I have a small hpc cluster and I’m currently using enginframe nice for hpc desktops.
Hi Jeff,
Thanks, I have the system working now. I have version 2.1.1-1 installed and I’m able to generate a desktop connection app where the default configuration is submitted to my slurm cluster.
I want to customize this to integrate gpus/vis nodes into the bc_desktop but following the instructions for a submit script doesn’t seem to work. I have the following configuration on the bc_desktop:
[root@ood bc_desktop]# ls -ltR
.:
total 4
-rw-r–r-- 1 root root 158 Feb 6 08:48 hishared.yml
drwxr-xr-x 2 root root 31 Feb 6 08:48 submit
./submit:
total 4
-rw-r–r-- 1 root root 133 Feb 3 14:38 my_submit.yml.erb
[root@ood bc_desktop]# cat hishared.yml
The system seems to be completely ignoring the submit script. Is this the correct way to customize this for this version? I thought the the node_type would be more appropriate for this and I am looking some advice the best way to set this up. I couldn’t find a node_type example with slurm. I want to make the default node type for the bc_desktop a gpu node.
I see you have the right file permissions in one of your comments.
I wonder if cluster here is case sensitive. This cluster is the filename of the cluster definition in /etc/ood/config/clusters.d, so the filename should be HI_cluster.yml. Though again, I don’t know if it’s case sensitive, we always use lowercase across the board.
OK - the default submit.yml provides almost nothing.
What’s the output you see in /var/log/ondemand-nginx/$USER/error.log? You should be able to search for sbatch and see execve lines. These are the commands we’re actually issuing when we submit the job.
The "--gres=gpu:1" line in native could be conflicting with gpus_per_node. It seems that the gpus_per_node configuration uses the '--gpus-per-node flag, so there could be some conflict there.
In any case, check your logs for what command + args are being issued and if that’s what you’d expect.
Hi Jeff,
on my github site, I posted the error.log file from this directory (/var/log/ondemand-nginx/jmcdonal/error.log.
I see the sbatch commands and I can confirm that none of the submit options are added but I don’t see any error, in particular about the yaml file.
jeff