Rocky 9 installation

jmcdonal · January 26, 2023, 2:21pm

Hi, I see on the website that there are now installation instructions for Rocky Linux 9 and its mentioned as a supported version but the repo for rocky9 doesn’t exist, e.g.
$ yum list
Open OnDemand Web Repo 864 B/s | 196 B 00:00
Errors during downloading metadata for repository ‘ondemand-web’:

Status code: 404 for https://yum.osc.edu/ondemand/2.0/web/el9/x86_64/repodata/repomd.xml (IP: 192.148.247.165)
Error: Failed to download metadata for repo ‘ondemand-web’: Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

and the yum repo file :

name=Open OnDemand Web Repo
baseurl=https://yum.osc.edu/ondemand/2.0/web/el$releasever/$basearch/

The URL isn’t valid. Is there a pending release for Rocky 9?
here is an snapshot of the site I’m looking at:

Thanks,
Jeff

travert · January 26, 2023, 2:54pm

Hello and welcome!

If you look in the URL you are hitting the develop branch, which has some features documented that are not quite finished yet, such as the packaging for Rocky 9.

To see the current features that are supported you want to hit the latest version on this url:
https://osc.github.io/ood-documentation/latest/

Sorry for the confusion!

jmcdonal · January 26, 2023, 3:05pm

OK, I took the plunge and moved to Rocky9 so I’m keen to try open ondemand here and I’d be happy to beta or alpha test for you if you like. I have a small hpc cluster and I’m currently using enginframe nice for hpc desktops.

travert · January 26, 2023, 3:07pm

Ok, @tdockendorf or @jeff.ohrstrom would know more about the packaging of that and it’s current state.

jeff.ohrstrom · January 26, 2023, 3:33pm

Yes RHEL/9 is slated for the next release. I can tell by the URL you’re looking at the next release’s documentation.

The 2.1 nightlies are mostly stable AFAIK. Here’s how to install the nightly version.

rpm -i https://apt.osc.edu/ondemand/latest/ondemand-release-web-latest-1-7.noarch.rpm
sed /etc/yum.repos.d/ondemand-web.repo -e 's/latest/nightly/g' \
          -e 's/ondemand-web/ondemand-web-nightly/g' > /etc/yum.repos.d/ondemand-nightly-web.repo

# only applicable for RHEL 8 or above.
yum module enable ruby:3.0 nodejs:14
yum install ondemand

jmcdonal · February 6, 2023, 3:38pm

Hi Jeff,
Thanks, I have the system working now. I have version 2.1.1-1 installed and I’m able to generate a desktop connection app where the default configuration is submitted to my slurm cluster.

I want to customize this to integrate gpus/vis nodes into the bc_desktop but following the instructions for a submit script doesn’t seem to work. I have the following configuration on the bc_desktop:

[root@ood bc_desktop]# ls -ltR
.:
total 4
-rw-r–r-- 1 root root 158 Feb 6 08:48 hishared.yml
drwxr-xr-x 2 root root 31 Feb 6 08:48 submit

./submit:
total 4
-rw-r–r-- 1 root root 133 Feb 3 14:38 my_submit.yml.erb
[root@ood bc_desktop]# cat hishared.yml

title: “HI Shared Desktop”
cluster: “HI_cluster”
submit: “submit/my_sumbit.yml.erb”
attributes:
desktop: “xfce”
bc_account: null
bc_queue: oldgpu
[root@ood bc_desktop]# cat submit/my_submit.yml.erb

script:
gpus_per_nodes: 1
native:
- “-n”
- “<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>”
- “–gres=gpu:1”

The system seems to be completely ignoring the submit script. Is this the correct way to customize this for this version? I thought the the node_type would be more appropriate for this and I am looking some advice the best way to set this up. I couldn’t find a node_type example with slurm. I want to make the default node type for the bc_desktop a gpu node.

Thanks in advance.
Jeff

jeff.ohrstrom · February 6, 2023, 4:09pm

I can’t tell from what you posted, but the first thing I’d think to ask is to be sure it’s well formatted YAML.

script:
  gpus_per_nodes: 1
  native:
    - "-n"
    - "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
    - "--gres=gpu:1"

For reference, these are the production configurations we use at OSC, so I know for sure you can redefine the submit attribute.

jmcdonal · February 6, 2023, 5:03pm

I posted my code here:

I ran both through yaml linters and I don’t see any issue. I don’t have to update the portal after I make changes to these files, right?

jeff.ohrstrom · February 6, 2023, 5:08pm

No, but a hard page refresh would be in order (ctrl + shift + R).

First - I’d suggest quoting everything. Just to be sure - I’m not sure if that’s the issue, but just in case.

Secondly, I’d check file permissions. I see you’re editing them as root but regular users should be able to see & read the files.

jeff.ohrstrom · February 6, 2023, 5:14pm

I see you have the right file permissions in one of your comments.

I wonder if cluster here is case sensitive. This cluster is the filename of the cluster definition in /etc/ood/config/clusters.d, so the filename should be HI_cluster.yml. Though again, I don’t know if it’s case sensitive, we always use lowercase across the board.

github.com

jmcdonal/bc_desktop/blob/4bc2269a7c8975767ccc8083a421ab78b76f3393/hishared.yml#L3


      
          ---
          title: "HI Shared Desktop"
          cluster: "HI_cluster"
          submit: submit/my_sumbit.yml.erb
          attributes: 
            desktop: "xfce" 
            bc_account: null
            bc_queue: oldgpu

jmcdonal · February 6, 2023, 5:33pm

Thanks for looking at this.
I added my clusters.d/HI_cluster.yml configuration file to the github site.
the filenames match. I only have one cluster.

jeff.ohrstrom · February 6, 2023, 5:45pm

OK - the default submit.yml provides almost nothing.

What’s the output you see in /var/log/ondemand-nginx/$USER/error.log? You should be able to search for sbatch and see execve lines. These are the commands we’re actually issuing when we submit the job.

The "--gres=gpu:1" line in native could be conflicting with gpus_per_node. It seems that the gpus_per_node configuration uses the '--gpus-per-node flag, so there could be some conflict there.

In any case, check your logs for what command + args are being issued and if that’s what you’d expect.

jmcdonal · February 6, 2023, 5:52pm

Hi Jeff,
on my github site, I posted the error.log file from this directory (/var/log/ondemand-nginx/jmcdonal/error.log.

I see the sbatch commands and I can confirm that none of the submit options are added but I don’t see any error, in particular about the yaml file.
jeff

jeff.ohrstrom · February 6, 2023, 6:24pm

You’ve misspelled the submit argument.

I knew it was something obvious that we were missing! You’ve spelled it S-U-M-B-I-T.

jmcdonal · February 6, 2023, 6:44pm

well, thats embarrassing!
Thanks.

system · August 5, 2023, 6:44pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Support/Advice for Rocky Linux? Get Help feature-request , question	4	970	May 17, 2022
Need Help Installing Open OnDemand 4.0 on Rocky Linux 8 Get Help	5	57	May 21, 2025
Latest open ondemand version Get Help question	2	56	May 11, 2025
RHEL 7 upgrade from 1.6 to 1.7 many failed dependency errors Get Help question	13	208	September 18, 2024
Open OnDemand repo is not working Feature Requests and Roadmap Discussion	10	1295	May 26, 2022

Rocky 9 installation

./submit: total 4 -rw-r–r-- 1 root root 133 Feb 3 14:38 my_submit.yml.erb [root@ood bc_desktop]# cat hishared.yml

title: “HI Shared Desktop” cluster: “HI_cluster” submit: “submit/my_sumbit.yml.erb” attributes: desktop: “xfce” bc_account: null bc_queue: oldgpu [root@ood bc_desktop]# cat submit/my_submit.yml.erb

Related topics

./submit:
total 4
-rw-r–r-- 1 root root 133 Feb 3 14:38 my_submit.yml.erb
[root@ood bc_desktop]# cat hishared.yml

title: “HI Shared Desktop”
cluster: “HI_cluster”
submit: “submit/my_sumbit.yml.erb”
attributes:
desktop: “xfce”
bc_account: null
bc_queue: oldgpu
[root@ood bc_desktop]# cat submit/my_submit.yml.erb