Run OOD in WSL on a Win10 VDI?

Apology in advance for the novice question. What prevents me from using my own instance of OOD on my local instance of WSL for submitting jobs to our campus cluster (or any other), where WSL in turn is running in a Windows 10 VDI? Thanks.

Best,
CB

I think the hurdle would be setting that up correctly to submit and be allowed to submit as your user from the laptop to the cluster. I would think you’d hit some issues with user mapping and permissions, but what exactly I don’t know. It’s a very peculiar way to use the software, but I mean, it could probably be done with enough effort. Why not just request the center to install it :slight_smile:

Thanks 10x! for reply and question.

I’d prefer a local test instance in order to become better informed (both as a user and salesman) before I pitch to the center for resources.

Re center installation:

  1. We’d need to allocate a separate “center”-provided, VM server since we don’t allow web services on the cluster login nodes.
  2. The center prefers specific resource requirements, which I haven’t been able to infer so far from reading the docs.
  3. Limitations such as IIRC the current center-provided virtual servers are preferably 8-core vice 16 to avoid NUMA-related performance degradation.
  4. I admittedly don’t know what would be involved in providing secure login to all resources involved.
  5. Other?

Thanks.

I think the following page in the docs might help with the requirements part:
https://osc.github.io/ood-documentation/latest/requirements.html

There’s also something like the HPC-toolset-tutorial which is a docker-compose setup that gives you a web front-end for OOD and some clusters to submit to and login to. This may help you wrap your mind around some of the docs and how things work:

And the actual part of the tutorial over OOD once you have it running:

To allow secure logins, OOD will work with various identity providers using Apache modules to handle auth and map the user and groups correctly:
https://osc.github.io/ood-documentation/latest/authentication.html

Thanks always for the information.

Is there any guidance on how many FTE’s are needed just to manage OOD? Thanks.

Not many, but it’s really an organization’s choice. It’s not so much to maintain it, it pretty much just runs once you’ve set it up, so failures don’t happen that much to the actual webserver.

Staff time is more likely going to be spent debugging the applications like Rstudio or Jupyter. Or updating the applications themselves to support different features.

Ultimately OOD is submitting jobs to your compute nodes, so failures originating from the modules that are being loaded or OS upgrades are more common than the webserver itself being broken.

I think there’s a lot of staff time put in to set it up and get it going. But once it’s going, it’s pretty much interrupt driven by customer tickets either wanting features out of your apps or apps themselves not working correctly.

That said - we don’t spend a lot of time on our OSC deployment on OnDemand. We generate maybe 1-3 customer tickets a month (a very liberal estimate) and tweak/update our applications sparingly (you can check our github activity for how often we’re updating our production applications and configs).

Re: Why not just request the center to install it :slight_smile:

I appreciate your continued patience in the following for those of us in IT that are research-facing but with limited, if any, HPC privileges.

I’m trying (perhaps naively so - to the extent that every datacenter is different) here to…

  1. Justify the resource expenditure for yet another VM web server and
  2. Anticipate that the datacenter will prefer if not expect that I alone perform all the due diligence and discovery up front before I even broach the subject of deployment of OOD for the institution.

In particular in the interest of security, IMHO it would be nice if there were a single place to go in the docs or otherwise to address the datacenter’s anticipated security concerns for introducing the additional attack surface presented by deploying OOD.

Arguably, I would be remiss if I failed to anticipate the above argument.

In just a cursory search at the documentation and support issues, I see the following.
Docs:
2. Authentication
3. Secure Apache httpd
Support: Preventing a JS Attack?

Comments welcome, and thanks again.

Hello, I have a docker-compose project that sets up OnDemand, authorization for an account (hpc.user) and a cluster that works for me with WSL. GitHub - matt257/ondemand-compose: Deploy Open Ondemand locally with Docker Compose

It is not finished and job submission isn’t working yet, but I will be updating it as I am able.

There are security features we’d like to implement in OnDemand, but with thoughtful configuration and controls we’ve found OnDemand to be secure. In 2018 and 2021 the NSF Trusted CI completed engagements and assessments of OnDemand’s security. Trusted CI Blog: 2021 Open OnDemand Engagement Concludes.

If you’d like to discuss specific security concerns for your center please let me know here or email me at mwalton@osc.edu.

Matt,

My apologies for the delayed response, but can you say how much space Docker takes up on your WSL - and comment on any other potential roadblocks?

For example, I installed apptainer in my WSL and. had to quickly uninstall it due to so much disk usage.

Here’s some of my sysinfo.

(VDI-Windows terminal)

wsl --version
WSL version: 2.3.24.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.65
MSRDC version: 1.2.5620
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.22631.3737

(WSL/Ubuntu terminal)
cbaribault:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
none 2006680 0 2006680 0% /usr/lib/modules/5.15.153.1-microsoft-standard-WSL2
none 2006680 4 2006676 1% /mnt/wsl
drivers 104735740 73569244 31166496 71% /usr/lib/wsl/drivers
/dev/sdc 1055762868 2504232 999555164 1% /
none 2006680 76 2006604 1% /mnt/wslg
none 2006680 0 2006680 0% /usr/lib/wsl/lib
rootfs 2003264 2208 2001056 1% /init
none 2006680 480 2006200 1% /run
none 2006680 0 2006680 0% /run/lock
none 2006680 0 2006680 0% /run/shm
tmpfs 4096 0 4096 0% /sys/fs/cgroup
none 2006680 76 2006604 1% /mnt/wslg/versions.txt
none 2006680 76 2006604 1% /mnt/wslg/doc
C:\ 104735740 73569244 31166496 71% /mnt/c
tmpfs 401336 16 401320 1% /run/user/1002

Best,
CB

For what its worth, I have an instance on a cluster with over 1k nodes who’s ood vm is literally only 4 cores and 16GB of ram. I would probably suggest doubling that, even though I don’t seem to have any load issues.

An OOD host server should be setup identically to a head-node on the cluster so any security concerns would match a head-nodes. You can do auth various ways some (like ldap) can be identical to how the headnodes handle auth. I tend to prefer OpenID Connect and hook right into the relevant universities upstream InCommon system as most institutes are hooked into it. Then you can use CiLogon or Globus to be the IDP in the middle. In these cases, most institutions will set their OIDC endpoint to require 2fa so you can support 2fa without having to implement it yourself since its handled upstream.

I think Jeff is on the money in terms of admin time of OOD servers. I admin at 5 different active servers all for different clusters and most of the time is spent just standing them up and then adding or tweaking apps as users request new apps and potential tweaks to the options you give them.

Frankly I am surprised your center’s HPC admin team has not already investigated OOD, its become kind of ubiquitous with all of the universities I have both directly set it up for and talk with. Just look at The Open OnDemand Community | Open OnDemand.