OOD internal server error

I am having issues running OOD over a public network. I have my cluster setup using an internal private network (the IPs used in inventory file are all private). I have updated config/group_vars/slurm-cluster.yml with the following:

servername: publicIP
httpd_port: 9050
httpd_listen_addr_port:

  • 9050
    DIDNT WORK

servername: public hostname
httpd_port: 9050
httpd_listen_addr_port:

  • 9050
    DIDNT WORK

servername: public hostname
httpd_port: 80
httpd_listen_addr_port:

  • 80
    Login page works, after login page I get redirected to an internal server error.

I have also tired updating the /etc/ood/config/ood_portal.yml file directly after reading the OOD official docs and updating using the command $ sudo /opt/ood/ood-portal-generator/sbin/update_ood_portal
that also fails with:
Generating new Apache config at: ‘/opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf’
sh: 1: cannot create /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf: Directory nonexistent
chown root:www-data /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf
No such file or directory @ apply2files - /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf
Run ‘update_ood_portal --help’ to see a full list of available options.

I would imagine there is a way to have the slurm cluster run on private management ports but have OOD run on a public IP?

Hello and welcome!

A few questions up front:

  • What version of OOD are you running?
  • What OS are you on?

Did the web login already work previously? I’m a little confused on where you are at and trying to accomplish at the moment.

Hello and thank you!

I am using Ubuntu20.04 and OOD was installed using deepops. Link:

It has never worked from the start, not sure why since the install script finishes successfully without any errors.
Here are the options I am able to set during install (link to file deepops/slurm-cluster.yml at master · NVIDIA/deepops · GitHub):
install_open_ondemand: yes

OOD Linux-host adapter requires slurm_cluster_install_singularity to be true

ood_install_linuxhost_adapter: yes

servername: ‘{{ ansible_fqdn }}’ #I have tried hard coding this to my public ip and public domain
httpd_port: 9050
httpd_listen_addr_port:

  • 9050
    httpd_use_rewrites: false
    node_uri: /node
    rnode_uri: /rnode

The ood_source_version: “v2.0.9”, sorry I forgot to include it earlier.

So sorry for the delay, I had to look at that deepops install to try and understand this.

That first error is a big clue:
Generating new Apache config at: ‘/opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf’ sh: 1: cannot create /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf: Directory nonexistent chown root:www-data /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf No such file or directory @ apply2files - /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf Run ‘update_ood_portal --help’ to see a full list of available options.

Which is a CentOS/Redhat location looking at /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf in the top line. However

So, it looks like quite a bit of configuration has happened to run the deepops repo code and it could be as easy as making sure wherever you are running the OOD web-node, if you try this with something like CentOS it could build correctly, but also actually work.

It is strange that it builds and doesn’t throw any errors to make this basic check of what OS it is using. But then I see that it is trying to do this with some kind of pixie boot that configures all this in an OS agnostic way. However, OOD itself would need to know this in order to adjust that very command when generating the config file for you as seen in ood_portal_generator:

But I’d have to look more into how they are doing all this to understand it better at the moment. Is ubuntu a requirement to be used or can you try the build with CentOS?

Hello and thank you for looking into this. I do have a requirement that I need to continue using Ubuntu and cant switch over to CentOS.

I would also like to note that the error:
“Generating new Apache config at: ‘/opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf’
sh: 1: cannot create /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf: Directory nonexistent
chown root:www-data /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf
No such file or directory @ apply2files - /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf
Run ‘update_ood_portal --help’ to see a full list of available options.” Only happens if I try to update the config with sudo /opt/ood/ood-portal-generator/sbin/update_ood_portal. But during the deepops install I do not see this error (or any OOD error being thrown).

I am not sure if based on this error you are determining that deepops automatically tried to install OOD for CentOS rather than Ubuntu?

Looking under ood-wrapper/vars I see configs for both Cent and Ubuntu so I would assume it has been tested for both using deepops.

Yeah this is a bit strange. I am unfamiliar with deepops so I can’t speak to what they can or can’t do, only what I have here :slight_smile:

It looks like you are using their config to set ondemand up then running commands within that web-node to try and correct internal server errors you get, is that correct?

What is the internal server error you see after the login?

“It looks like you are using their config to set ondemand up then running commands within that web-node to try and correct internal server errors you get, is that correct?”
That is correct. To get around this I have been directly changing deepops config and rerunning the full install to update settings (this is a pain since deepops config takes ~20 mins to run so simply updating OOD would be useful for testing)

“What is the internal server error you see after the login?”
When I set deepops config to run on port 80 I am greeted with a login page, after I enter my login (correct or incorrect gives the same error):
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.

If I set the deepops OOD port to 9050 the error I get is “this site cant be reached” and port 80 goes back to the default apache setup page.