I have a few questions. If I should break this up into separate posts for each question, let me know and I’ll delete this post and make new ones.
The documentation says that the machine running OOD should be configured similarly to a cluster login host and should have scheduling (slurm, etc) software installed. Is this actually true? I ask because the cluster configuration containins the login/host key which is a machine that is reached by ssh where (if I understand correctly) the actual job submission is done. So - is it required that the OOD node be able to submit jobs directly or is it sufficient that it can do so via ssh? If the latter it would simplify our production configuration a great deal.
When I try and open a shell in OOD I get “Failed to establish a websocket connection. Be sure you are using a browser that supports websocket connections.” (I’ve tried several browsers including Chrome) and I see this in the apache error log:
[Fri May 13 15:23:03.121366 2022] [proxy:warn] [pid 70886:tid 139721502082816] [client [my-ip-redacted]:51832] AH01144: No protocol handler was valid for the URL /pun/sys/shell/ssh/ip-[redacted-ip].us-west-2.compute.internal (scheme 'unix'). If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule.
[Fri May 13 15:23:03.121806 2022] [lua:info] [pid 70886:tid 139721502082816] [client [my-ip-redacted]:51832] res_content_location="" log_hook="ood" res_content_disp="" log_id="LxQeQG78SKU" req_uri="/pun/sys/shell/ssh/ip-[redacted-ip].us-west-2.compute.internal" req_protocol="HTTP/1.1" remote_user="ubuntu" req_method="GET" res_content_encoding="" req_user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 OPR/86.0.4363.50" time_proxy="0.457" req_accept_encoding="gzip, deflate, br" req_server_name="openondemand-staging.[redacted].org" res_content_length="632" req_accept="" res_content_type="text/html; charset=iso-8859-1" req_accept_language="en-us,en;q=0.9" time_user_map="0.003" res_location="" req_origin="https://openondemand-staging.[redacted].org" req_filename="proxy:ws://localhost/pun/sys/shell/ssh/ip-[redacted-ip].us-west-2.compute.internal?csrf=ZuZzcdOf-JleEqMHjBKY1vE8MFd3IpNF6Ne7" req_status="500" req_is_websocket="true" req_handler="proxy-server" req_cache_control="no-cache" res_content_language="" req_referer="" req_hostname="openondemand-staging.[redacted].org" req_user_ip="[my-ip-redacted]" log_time="2022-05-13T15:23:03.121720Z" req_is_https="true" req_accept_charset="" req_content_type="" req_port="443" local_user="ubuntu"
When I try and run an interactive desktop, the job fails with errors saying it cannot find vncpasswd and vncserver, even though these are installed and in the PATH on the compute nodes, along with the other requirements (websockify, nmap, etc).
I understand that Passenger apps are web apps that run on the OOD node, and Interactive Apps run on cluster compute nodes. Is it possible to write a custom web app, similar to a Passenger app, but have it run as an Interactive App on cluster compute nodes? If so, is this process documented anywhere? The only documentation I could find was for specific things like Jupyter and RStudio, not for custom web apps.
There is a variant here. One way is everything is executed locally, squeue, sbatch and so on. Another variant is that you specify submit_host in the cluster.d file and we’ll SSH to that host and issue commands.
I think you need the WS Apache module
This should enabled it (you may also have to install it).
sudo a2enmod proxy_wstunnel
Use script_wrapper to debug what your PATH actually is during the job’s execution. You expect it to be there, but the question is what is is really?
Sure - but you’re writing a web app so it’s a bit of work. We have libraries available to sort of help - but I’ll bet there’s a learning curve.
Here’s an example of a workflow app that we maintain.
Here’s it’s basic premise. There’s a workflow with 4 steps. The first step is an interactive session to setup computations in a GUI (VFTSolid)
Here’s an image I’m presented with when that job is launched and running. As you indicate - it’s a Passenger app that started an interactive session (maybe/probably through our ood_core library)
You click through, do your thing. Then the next stages are batch jobs that generate Thermal and Structural analyses which you can view. I suspect you’re asking for something similar. As you’ll see it’s no small feat, and when I say we maintain it - you can tell how often we update it, so there’s a maintenance cost to these apps. This app specifically is more or less in maintenance mode where we only update it when it breaks and we need to.
Thanks, appreciate the quick and comprehensive answers.
Is what I wanted to klnow.
Worked a charm!
Still having some trouble. Edited /var/www/ood/apps/sys/bc_desktop/submit.yml.erb and added before_file: /home/ubuntu/before_file.sh. That file exists, is executable, and contains various diagnostics. I restarted apache and the PUN and then submitted a new desktop job. It failed with the same error and did not run my diagnostics. There was a before.sh script in the same directory as the job log that contains:
# Export the module function if it exists
[[ $(type -t module) == "function" ]] && export -f module
Did I do something wrong in setting up the before_file?
Thanks, that’s what I wanted to know. Ruby might be the path of least resistance but is it possible to write interactive webapps in another language? I’m thinking python since that is our officially supported language…
BTW, I am having a new issue now. When I try and SSH to my OOD node I get this:
channel 0: open failed: administratively prohibited: open failed
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
This just started happening after getting the shell functionality working in OOD so I am wondering if somehow I messed up actual ssh connections by doing that? Luckily I can get in via the OOD shell, otherwise I would be completely locked out.
First, you don’t want to be editing files in /var/www/ood, only /etc/ood/config.
Secondly, I think you want script_wrapper. One thing to note here is that job_script_content.sh is the file being submitted to Slurm. So that’s the entry point when you go looking for how this job does what it does. You’ll see there that it does all this stuff before the before script, so script_wrapper will let you add something at the immediate top.
FWIW here are our production configurations These sub-directories get dropped file for file, as is, into /etc/ood/config. This is how we reconfigure the bc_desktop app.
Sorry you have to click through to Basic Batch Connect Options. That page also shows how to set these options globally at the cluster level.
Also you’re about to run into this nightly bug where VNC jobs don’t work. 10 million apologies for the instability in nightly. I’ll try to fix it next week, and as it’s a nightly, as soon as I fix it it’ll be published the next day.
Maybe I will put this aside till it is fixed and focus on other things. I’m just trying to get all the cool features working so I can demo it and get people excited about it, and also to learn how to set up our production instance. I’ll probably need some guidance on how to update when the new version becomes available. Or should I be working with the latest stable release instead of a nightly?
Thanks for the pointer to the script_wrapper documentation. I wonder if I can trouble you to send me a minimal .yml.erb file that uses script_wrapper but otherwise uses all defaults. I am a bit confused by all the extraneous stuff in the repo you linked to - not sure what’s necessary and what’s not.
Going back to an earlier question - can interactive web apps be written in languages other than Ruby? I am assuming yes because RStudio and Jupyter are certainly not written in Ruby, but it would be great to confirm. Our usual stack for web apps is python + flask. Right now I’m working on getting a hello world passenger app working with that stack, and after that I’d like to try an interactive one.
Really appreciating all the support - this makes me feel confident that we’ll be in good hands when we roll this out to our users.
You can still do basic interactive apps (not vnc). Jupyter’s probably the easiest to get going.
Installing from source to get the latest stable version is likely hard/unstable to upgrade. That is, installing the latest stable release from the source may be OK - but upgrading source installations is not something we test and I doubt it would upgrade easily.
The nightly on the other hand should just be as simple as pulling down the new deb and running apt install (or apt update). But then again, you may replace some bugs with new ones.
Yes, ruby (2.7), nodejs (14) and python (3.x) are available. Defaults are given which you can override.
You ask the question again though - these are Passenger apps. Passenger apps are web stacks that boot and run on the web node. In fact - the dashboard is a Passenger app.
We don’t care about the language interactive apps are in. They boot and run on compute nodes during a job and we just proxy to them. They speak HTTP (that’s basic) or we step the VNC infrastructure to connect (that’s vnc, and also over HTTP).
You can use script_wrapper just like before_script. I mean - in the same place, in the same YAML. script_wrapper are before_script siblings in whatever YAML structure you put them in (globally or just for that app).
batch_connect:
script_wrapper: |
echo $PATH
which vncserver
module list
%s
I am trying to get a Passenger app written in Python/Flask to work. When I try and run it I get an error:
File "/home/ubuntu/ondemand/dev/flask_passenger_wsgi_hello/sample/app/__init__.py", line 9, in <module>
from flask import Flask
ImportError: No module named flask
In my python script I have this:
import sys
print(sys.executable)
…and on the error page it prints out /usr/bin/python
So that tells me it is using python 2 and not python 3 which I want (and also where flask is installed).
Both my passenger_wsgi.py and my application code start with #!/usr/bin/python3 but that seems to be ignored. How can I force passenger (or whatever) to use /usr/bin/python3 or another version of python3?
I have a patch for the VNC bug in right now. It’s easy enough to apply yourself, though it should hit nightly maybe tonight - latest would be the build you can download on Friday.