Rocky 9 installation

Hi, I see on the website that there are now installation instructions for Rocky Linux 9 and its mentioned as a supported version but the repo for rocky9 doesn’t exist, e.g.
$ yum list
Open OnDemand Web Repo 864 B/s | 196 B 00:00
Errors during downloading metadata for repository ‘ondemand-web’:

and the yum repo file :

name=Open OnDemand Web Repo
baseurl=https://yum.osc.edu/ondemand/2.0/web/el$releasever/$basearch/

The URL isn’t valid. Is there a pending release for Rocky 9?
here is an snapshot of the site I’m looking at:

Thanks,
Jeff

Hello and welcome!

If you look in the URL you are hitting the develop branch, which has some features documented that are not quite finished yet, such as the packaging for Rocky 9.

To see the current features that are supported you want to hit the latest version on this url:
https://osc.github.io/ood-documentation/latest/

Sorry for the confusion!

OK, I took the plunge and moved to Rocky9 so I’m keen to try open ondemand here and I’d be happy to beta or alpha test for you if you like. I have a small hpc cluster and I’m currently using enginframe nice for hpc desktops.

Ok, @tdockendorf or @jeff.ohrstrom would know more about the packaging of that and it’s current state.

Yes RHEL/9 is slated for the next release. I can tell by the URL you’re looking at the next release’s documentation.

The 2.1 nightlies are mostly stable AFAIK. Here’s how to install the nightly version.

rpm -i https://apt.osc.edu/ondemand/latest/ondemand-release-web-latest-1-7.noarch.rpm
sed /etc/yum.repos.d/ondemand-web.repo -e 's/latest/nightly/g' \
          -e 's/ondemand-web/ondemand-web-nightly/g' > /etc/yum.repos.d/ondemand-nightly-web.repo

# only applicable for RHEL 8 or above.
yum module enable ruby:3.0 nodejs:14
yum install ondemand

Hi Jeff,
Thanks, I have the system working now. I have version 2.1.1-1 installed and I’m able to generate a desktop connection app where the default configuration is submitted to my slurm cluster.

I want to customize this to integrate gpus/vis nodes into the bc_desktop but following the instructions for a submit script doesn’t seem to work. I have the following configuration on the bc_desktop:

[root@ood bc_desktop]# ls -ltR
.:
total 4
-rw-r–r-- 1 root root 158 Feb 6 08:48 hishared.yml
drwxr-xr-x 2 root root 31 Feb 6 08:48 submit

./submit:
total 4
-rw-r–r-- 1 root root 133 Feb 3 14:38 my_submit.yml.erb
[root@ood bc_desktop]# cat hishared.yml

title: “HI Shared Desktop”
cluster: “HI_cluster”
submit: “submit/my_sumbit.yml.erb”
attributes:
desktop: “xfce”
bc_account: null
bc_queue: oldgpu
[root@ood bc_desktop]# cat submit/my_submit.yml.erb

script:
gpus_per_nodes: 1
native:
- “-n”
- “<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>”
- “–gres=gpu:1”


The system seems to be completely ignoring the submit script. Is this the correct way to customize this for this version? I thought the the node_type would be more appropriate for this and I am looking some advice the best way to set this up. I couldn’t find a node_type example with slurm. I want to make the default node type for the bc_desktop a gpu node.

Thanks in advance.
Jeff

I can’t tell from what you posted, but the first thing I’d think to ask is to be sure it’s well formatted YAML.

script:
  gpus_per_nodes: 1
  native:
    - "-n"
    - "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
    - "--gres=gpu:1"

For reference, these are the production configurations we use at OSC, so I know for sure you can redefine the submit attribute.

I posted my code here:

I ran both through yaml linters and I don’t see any issue. I don’t have to update the portal after I make changes to these files, right?

No, but a hard page refresh would be in order (ctrl + shift + R).

First - I’d suggest quoting everything. Just to be sure - I’m not sure if that’s the issue, but just in case.

Secondly, I’d check file permissions. I see you’re editing them as root but regular users should be able to see & read the files.

I see you have the right file permissions in one of your comments.

I wonder if cluster here is case sensitive. This cluster is the filename of the cluster definition in /etc/ood/config/clusters.d, so the filename should be HI_cluster.yml. Though again, I don’t know if it’s case sensitive, we always use lowercase across the board.

Thanks for looking at this.
I added my clusters.d/HI_cluster.yml configuration file to the github site.
the filenames match. I only have one cluster.

OK - the default submit.yml provides almost nothing.

What’s the output you see in /var/log/ondemand-nginx/$USER/error.log? You should be able to search for sbatch and see execve lines. These are the commands we’re actually issuing when we submit the job.

The "--gres=gpu:1" line in native could be conflicting with gpus_per_node. It seems that the gpus_per_node configuration uses the '--gpus-per-node flag, so there could be some conflict there.

In any case, check your logs for what command + args are being issued and if that’s what you’d expect.

Hi Jeff,
on my github site, I posted the error.log file from this directory (/var/log/ondemand-nginx/jmcdonal/error.log.

I see the sbatch commands and I can confirm that none of the submit options are added but I don’t see any error, in particular about the yaml file.
jeff

You’ve misspelled the submit argument.

I knew it was something obvious that we were missing! You’ve spelled it S-U-M-B-I-T.

well, thats embarrassing!
Thanks.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.