OOD in a single workstation

lhcamilo · November 11, 2024, 7:36am

Hello there,

We have this deeplearning workstation at our department with 4 GPUs and lately we have had more users than GPUs, so I thought that introducing OOD with slurm as a queue system would solve our troubles.

The software we use are mainsly jupyter, deeplabcut and interactive desktops.

I understand that this is not the intended use of OOD, so I am just hoping to understand things a little better to tweak them.

For the most part everything works fine, I am able submit jobs, create jupyter and desktop sessions, but what I am having trouble is connecting with the.

For the interactive desktop, noVNC is not configured to the correct websocket port. I can check the connection.yml file and enter the correct port manually so it is not the end of the world but it is not ideal.

In a similar fashion, jupyter points to the wrong location. I can again go to the connections.yml and check the port and password and enter that manually but I really want to remove this friction from my users that are not exaclty tech savvy.

I think I must understand better how OOD handles things, but web technology is not my forte.

If anyone has insight into troubleshooting this or experience with this unconventional deployment I would appreciate the help.

Best,

jeff.ohrstrom · November 12, 2024, 9:10pm

Hi and welcome!

I’d say let’s solve this one first as it’s easier. Can you share the view.html.erb for this Jupyter application?

lhcamilo · November 13, 2024, 11:23am

Hi there,

Thank you for taking the time to troubleshoot this with me, I really appreciate it.

Jupyter is a fine place to start. I took a stab at view.html.erb, but it became apparent to me that I did not understand how things were being handled.

<form action="/node/<%= host %>/<%= port %>/login" method="post" target="_blank">
  <input type="hidden" name="password" value="<%= password %>">
  <button class="btn btn-primary" type="submit">
    <i class="fa fa-eye"></i> Connect to Jupyter
  </button>
</form>

This seems to match the logic at ./template/before.sh.erb

c.NotebookApp.base_url = '/node/${host}/${port}/'

From the output of one jupyter session I got this.

[I 2024-11-08 10:18:01.466 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-11-08 10:18:01.466 ServerApp] http://localhost:21857/node/navu/21857/lab
[I 2024-11-08 10:18:01.466 ServerApp]     http://127.0.0.1:21857/node/navu/21857/lab

So I tried to simply append the port to the beginning of the address at the view.html.erb code, without success.

jeff.ohrstrom · November 13, 2024, 4:01pm

Yea this all should just work. That view.html.erb looks good.

If you right click the button and ‘Inspect’ the HTML do you see the correct password in the form’s HTML?

lhcamilo · November 14, 2024, 6:39am

Yep, I have looked at the page source and as far as I can tell the password is correct.

Though I find it odd that the error that I get is a 404 error on port 80

So I am thinking that maybe I messed up something in the ood_portal.yml config. Which is likely since I am a complete noob at web stuff.

ood_portal.yml (13.5 KB)

I reckon there are nuances from running both OOD and jobs in the same machine that I do quite grasp. At times, I wonder if it would not had been better to have deployed OOD from an LXC container.

jeff.ohrstrom · November 14, 2024, 2:47pm

OK that’s easy, you haven’t enabled the reverse proxy yet. Follow these instructions and you should be good to go.

https://osc.github.io/ood-documentation/latest/how-tos/app-development/interactive/setup/enable-reverse-proxy.html

I doubt it. You’d have the same trouble you’re having now, but you’d also potentially have container issues on top of OOD setup issues.

lhcamilo · November 15, 2024, 11:48am

Thanks a lot, that seems to have done the trick.
both jupyter and novnc seem to be working as expected.

I now just need to troubleshoot the issue that vnc sessions are not being killed/cleaned if the delete button is pressed. I have looked around some other posts and It seems that this is related to lock files at /tmp.

But that is a topic for another thread.

Thanks again for the help

jeff.ohrstrom · November 15, 2024, 2:52pm

What type of scheduler do you run? If it’s Slurm you need set the ProctrackType to proctrack/cgroup because some of these vnc server processes’ have parent PID of 1.

https://slurm.schedmd.com/slurm.conf.html#OPT_ProctrackType

lhcamilo · November 22, 2024, 12:11pm

Hi there,

Sorry for the delay in responding.

It seems that adopting cgroups as ProcTrack has indeed solved the issue.

Thanks a lot.

system · May 21, 2025, 12:12pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reverse Proxy on Separate Host? Get Help	19	1988	December 6, 2021
Clicking on "Connect to Jupyter" leads to wrong URL Get Help	24	3162	May 26, 2022
"undefined method `desktop' for #<BatchConnect:" Error Get Help question	6	416	May 2, 2023
noVNC: Failed to connect to server Get Help	17	1231	April 8, 2023
OOD as a teaching tool Get Help	15	744	May 26, 2022

OOD in a single workstation

Related topics