Ubuntu support?

Can I ask for Ubuntu support? Or are there any install instructions for Ubuntu ? Did not see a “install from source” option either … which is a bit surprising given that the project has HPC roots ?

1 Like

@dipeit time permitting we can look into what it would take to support Debian-based systems natively. We have not had much demand for this so far. Can you tell us a little more about your architecture / site?

As for installation from source: our installation process used to be very manual up until our 1.3 release which is where we cut over to RPM based installation. Those installation instructions are dated, but still available for older versions: https://osc.github.io/ood-documentation/release-1.2/installation.html

I am also interested in an Ubuntu/Debian installation document or packages. Our cluster runs Ubuntu, along with most of our support servers. I can provide additional architecture information and/or some testing help if necessary.

@bmcgough and @dipeit what version(s) of Ubuntu are you running?

Folks,

I spent the second half of my Thursday experimenting with installation on Ubuntu and found that the installation of system dependencies is going to be complicated. OnDemand uses SCLs which Ubuntu does not have. Installation then becomes a task of trying to ensure compatible versions of the OnDemand apps, Ruby, Passenger and Apache. Ubuntu 18.x LTS provides Ruby 2.5 by default which the OOD apps have not been tested against. Snap can be used to install Ruby 2.4 but the available versions of Apache and Passenger require the Apt-provided Ruby 2.5, and there are no Snap alternatives for Apache or Passenger. Tangentially, I have read anecdotal evidence that Snap’d software takes up more storage and is slower to boot.

I’m happy to announce that we’ve created an Ansible role for Open Ondemand that I’ve been testing on Ubuntu Bionic.

It should be said, as it is in the README, this is still a work in progress. The runs I’ve been using last week and today produce some zombie process’. Which is to say, this install procedure is not yet production ready, not by a long shot. So, use patience and caution as we update it.

Hi everyone. With my employer, I’m in the process of setting up an Ubuntu 18.04 cluster running openPBS with Singularity containers. I’m getting up to speed with Open OnDemand configuration. I don’t have a background with Ansible. I’m going through some online Ansible courses now to at least understand how to use Ansible roles. I haven’t figured out how to make use of the Ansible role created here yet.

If there is a step-by-step guide on how to implement this Ansible role or a good resource someone has on using an existing Ansible role. Please send it my way. I’m also open for general discussion with Open OnDemand on Ubuntu.

If you have container support on your local machine, I’d suggest running through the test cases for the role. That may clean way to get familiar with ansible because things ‘just work’ (even if they’re through the testing framework molecule).

As you can see from the default test case, it simply runs through the role. That’s as simple of a playbook as you can get, 1 role, no variables. Of course all the defaults are for CentOS/RHEL, but it’s a container, so you can at get your hands dirty with ansible and the role.

You can run these commands to get that all setup (the current working directory being the role’s directory)

pip install -r molecule/requirements.txt
molecule converge

Taking that a step further with playbooks, inventories and config files, here’s what I have on hand to test this role in an ubuntu:20.04 container. Your inventory file may differ if you’re using docker or a VM you can modify (and perhaps throw away).

[jeff 04:48:59 ansible()] 🐼 cat playbooks/open-ondemand.yml 
- hosts: ondemand-hosts
  roles:
  - ondemand
   
[jeff 04:49:03 ansible()] 🐍 cat inventories/localhost-containers 
[ondemand-hosts]
ubuntu ansible_connection=podman ansible_python_interpreter=/usr/bin/python3

[jeff 04:51:01 ansible()] 🐠 cat conf/ood-src.yml 
ood_source_version: "v1.8.19"
install_from_src: true

# ubuntu 20.04 location
ruby_lib_dir: "/usr/lib/x86_64-linux-gnu/ruby/2.7.0" 

[jeff 04:52:01 ansible()] 🐯 ansible-playbook -i inventories/localhost-containers --extra-vars=@conf/ood-src.yml playbooks/open-ondemand.yml

Another note on testing in containers is that the default container needs sudo python3 python3-pip installed so you can’t use an off the shelf ubuntu container.

Hope that helps!

Also - I’m now seeing that the role names are different which could be confusing.

This is what my ansible roles directory looks like, I’ve symlinked ood-ansible with ondemand so that’s why I’m able to reference that role that way.

[jeff 05:00:24 images(master)] 🐭 ls ~/.ansible/roles/ -l 
drwxrwxr-x. 13 jeff jeff 4096 Mar  5 12:51 ondemand
lrwxrwxrwx.  1 jeff jeff    8 Jan 23  2020 ood-ansible -> ondemand

Thanks for the quick reply @jeff.ohrstrom

For testing purposes, I do have some a couple VM’s setup at (one server and one compute node). I’ve already installed Ansible through the PPA. I need to get my head wrapped around configuring and using Ansible playbooks and roles. I’ll work through what you provided. Let me know if you’d like to fork this off to another topic.

Getting back to this. After trying to get this Ansible role working in a Ubuntu 18.04 VM and running into numerous dependency issues. I grabbed a Ubuntu 20.04 container and shelled into that with Singularity to install the Ansible role. I had to install numerous dependent packages that weren’t included but got all of the molecule/requirements.txt packages to install.

However, now I got this error with molecule. Any thoughts on how to handle it?

Singularity> molecule converge
--> Test matrix
    
└── default
    ├── dependency
    ├── create
    ├── prepare
    └── converge
    
--> Scenario: 'default'
--> Action: 'dependency'
Skipping, missing the requirements file.
Skipping, missing the requirements file.
--> Scenario: 'default'
--> Action: 'create'
--> Sanity checks: 'docker'
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.8/http/client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1004, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 944, in send
    self.connect()
  File "/usr/local/lib/python3.8/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/dist-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.8/http/client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1004, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 944, in send
    self.connect()
  File "/usr/local/lib/python3.8/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 214, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/usr/local/lib/python3.8/dist-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/usr/local/lib/python3.8/dist-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 237, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/molecule", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/molecule/command/converge.py", line 104, in converge
    base.execute_cmdline_scenarios(scenario_name, args, command_args, ansible_args)
  File "/usr/local/lib/python3.8/dist-packages/molecule/command/base.py", line 104, in execute_cmdline_scenarios
    execute_scenario(scenario)
  File "/usr/local/lib/python3.8/dist-packages/molecule/command/base.py", line 146, in execute_scenario
    execute_subcommand(scenario.config, action)
  File "/usr/local/lib/python3.8/dist-packages/molecule/command/base.py", line 135, in execute_subcommand
    return command(config).execute()
  File "/usr/local/lib/python3.8/dist-packages/molecule/command/create.py", line 94, in execute
    self._config.provisioner.create()
  File "/usr/local/lib/python3.8/dist-packages/molecule/provisioner/ansible.py", line 722, in create
    pb.execute()
  File "/usr/local/lib/python3.8/dist-packages/molecule/provisioner/ansible_playbook.py", line 104, in execute
    self._config.driver.sanity_checks()
  File "/usr/local/lib/python3.8/dist-packages/molecule_docker/driver.py", line 234, in sanity_checks
    docker_client = docker.from_env()
  File "/usr/local/lib/python3.8/dist-packages/docker/client.py", line 96, in from_env
    return cls(
  File "/usr/local/lib/python3.8/dist-packages/docker/client.py", line 45, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 197, in __init__
    self._version = self._retrieve_server_version()
  File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 221, in _retrieve_server_version
    raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

You’re trying to get ansible to connect to a docker container, which it can’t because the docker socket file doesn’t exist.

I don’t think ansible has support for Singularity. When I look at connection plugins I see docker, podman, lxc, lxd, jail and chroot but no singularity.

If you have issues with the 18.04 VM, feel free to open tickets on Github and I can try to sort through them.

This seems like a obvious step before moving to more dashboards.
Portable is very important. The majority of AI/DL is on Ubuntu, and with RHEL licensing and changes to CentOS. I have written a lot of RPMs and RPM spec files. So if we have the spec-files we should be able to automate on other platforms fairly simply.

I am looking at the Ansible for OSC right now. So that may work.
It seems to meet the basic requirements of what I need, except for Ubuntu setup and updates.

I am starting to look at what:

  1. Packages are needed
  2. Any package modifications
  3. Configurations.

Then we should be able to take this and build *.deb packages and *.rpm packages.
Or a ‘script based’ install for the others.

I am starting to go through this now. So if anyone wants to work together to make this work on Ubuntu I am starting on that.

Or if I need to ‘reinvent’ to make it on Ubuntu, I will just make a similar package that has the basic functionality for Ubuntu.
- Slurm
- JupyterLab/Notebook
- TurboVNC - web
- Grafana style dashboard
- OpenLDAP - option
- View files and scheduling resources
- Anaconda/Python Virtual env to control versions

Thank you!

Mark

1 Like

We’re definitely going to be shipping a .deb packages at some point.

I started this project to track our progress: Ubuntu Packaging · GitHub

2 Likes

Getting back to this after having a new cluster up and running on-site.

My new cluster is based on Ubuntu 18.04 with OpenPBS for a job scheduler. I have the latest Ansible setup on my main head node and pinging all compute nodes. I’ve cloned the ood-ansible repo and am currently trying to use the Ansible role to deploy OOD.

I’m trying to figure out exactly how to use this Ansible role. I’m a novice user of Ansible so maybe I’m not understanding something obvious. Any help with using this role is welcomed. Thanks!

1 Like

We’re starting to publish .deb files for 20.04 if that’s of interest to you. Though the ansible role doesn’t have support for installing yet.

That said, here’s the test playbook we use. You’ll need this variable install_from_src set to true. Then you’ll need an inventory. If you want to run the playbook on the host it’self - then I guess it’d be localhost. Otherwise you’ll initiate it all from say your own laptop and the inventory entry is the FQDN of the server you want to install on.

Maybe you’re looking for storing and finding roles?
https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_roles.html#storing-and-finding-roles

Or maybe you want to pull osc.open_ondemand from galaxy?
https://docs.ansible.com/ansible/latest/galaxy/user_guide.html#installing-collections

Thanks. Looking through the YAML files, I had been wondering if I need to install from source by setting the install_from_src flag.

My uncertainty seems to be with what to pass to the command ansible-playbook and also where to configure my cluster for OOD. From what you said, it seems I need to make my own playbook similar to converge.yml. Would I configure my cluster with /etc/ood/config/clusters.d/<cluster_key>.yml after building and installing from the role?

I’ll checkout the role available in Ansible Galaxy

Hello,

I’m very interested in testing Open OnDemand on a ubuntu 20.04 cluster;

@Chase did you make progress in installing open ondemand ? For your interest, i’m experimenting with a vagrant virtual slurm cluster and at least succeeded in importing the playbook adding some variables in it .

This is here. Note this is, for now, failing at the task “TASK [osc.open_ondemand : build the project (this will take some time)]” (see the readme on the github repo) for some rake build issues I did not investigate yet.

@jeff.ohrstrom you mention providing deb packages for ubuntu 20.04. That would be probably more convenient than building from source with the ansible galaxy role. Would you mind indicating where the deb files can be downloaded ?

The deb packages at this time only exist for OnDemand 2.1 which still only has unstable nightly releases. The stable releases for OnDemand 2.0 do not have deb packages.

Has anyone installed OnDemand for ubuntu 20.04? Thanks for any help.