summary of timeout: do i understand this correctly?
For an interactive app based on vnc template, there is a ‘websockify_timeout_seconds’ that can be expanded to 20 seconds. This is an upper limit due to websockify internals, iirc
For basic and template, annecdotally, one can apply another timeout in the ../template/after.sh.erb
if wait_until_port_used “${host}:${port}” 160; then
The 160 is an arbitrary local choice. This can be much longer, and is (i think) independent of the websockify timeout,
I don’t understand what timeout this setting actually affects, but we’ve seen it be effective.
So finally – the desktop app, which does use the vnc app, does not have an open structure with ‘after.sh(.erb)’. Is there another option to implement a timeout for the desktop app? is there another app template that is now superceeding the bc_desktop?
There’s a few things here that are a bit off and some I’m not sure of myself.
I’m unsure if there is a hard 20 second ceiling for websockify itself. I couldn’t find anything online saying this either, but I could be mistaken. I’m just not sure if that’s true.
OOD itself has 3 types of apps: bacic, vnc, and an external contributor created vnc_container which exists but is much less common currently.
wait_until_port_used is just making the bash script wait before it writes the connection.yml file for OOD, so if you have network latency or heavy load or anything to cause the job to take a bit to spin up this can help ensure the app waits before timing out.
I’m not sure what you mean here. You can use the after.sh.erb with a vnc app. You also have the option of using the script_wrapper in the submit to issue the wait_until_port_used I think as well. It’s all about just finding the ways OOD lets you inject bash, which is those scripts and the script_override for the most part.
Finally, bc_desktop is not a template. The templates are only basic, vnc and vnc_container. And by template I mean the thing you set in you submit.yml for the app.
I do see how it could be confusing given the doc does title Basic Batch Connect but the key in that file to notice is the template:
batch_connect:
template: "basic" # this is one of the 3 template types OOD understands
...
This is what the OOD team means when the talk about the “types of scientific apps” OOD can run.
Also, you do have the option of moving some of this up into a clusters.d file and putting the batch_connect stuff there to set it across the cluster as well, I was just trying to show a more modular approach.
Hi, Travis – Thanks for providing this guidance. Appreciate you.
Thanks particularly for the comments about ‘bash injection’ as the high-level perspective to take. For the bc_desktop, I was too narrow in my perspective by looking at /etc/ood/config/apps/bc_desktop, and forgetting that the working desktop app can certainly accept the ../template/before.sh.erb and ./template/script.sh.erb, so why not introduce the after.sh.erb there as well? Thanks again for reorienting me.
And a colleague was showing me this morning that the clusters.d definitions also allow for the bash injection – so I think it’s coming together now.
OOD itself has 3 types of apps: bacic, vnc, and an external contributor created vnc_container which exists but is much less common currently.
Yeah, I guess I had previously heard that the vnc_container emerged from the community. Maybe in the Appverse affinity group, or some other such, I’ll encounter people who have developed to leverage that. Do you recall the motivation that led to ‘vnc_container’ being developed? At the highest level, it must be to better support containerized apps ( : I will dig into the github repo to follow-up on this exchange.
I’m unsure if there is a hard 20 second ceiling for websockify itself. I couldn’t find anything online saying this either, but I could be mistaken. I’m just not sure if that’s true.
Good point. I’ve been trying to find this reference again. I quoted this to the team at Case, since we’ve been struggling with network latency. What i recall reading was essentially “if you need more than 20 s for the port communication, there’s a deeper problem that shouldn’t be papered over with a longer timeout”
True or not, I’ll admit that might be a bit beside the point from an operational point of view.
Technical review question. To implement the websockify timeout in submit.yml.erb, i’ve previously just done the following – is this actually an effective approach?
I’ve not yet confronted the syntax for a cluster definition. Would the timeout alongside the script_wrapper, or within? From submit.yml.erb, I’m thinking it’s the same level…
For the websockify_timeout_seconds I’m a bit unsure looking at this where it goes. We don’t seem to call it out in the docs explicitly that I could find (but maybe it’s in there somewhere), yet in the source code I see it being used in the backend ood_core code here:
This makes me think it should be there in the submit.yml to get picked up and help setup the environment for the job.
For the indentation questions, the doc examples are likely the best bet to see where to place the indentation.
For the cluster files, you can see an example here with the batch_connect stanza and even a script_wrapper for each type of app launched on that cluster, so you see a different command run whether it’s basic or vnc on that cluster:
But this is only if you want that to run for each job submitted to that cluster, which for setting things like where the websockify command is located or ensuring modules are purged on job launch this feature can be quite helpful.
I hope that helps and clears some things up, but let me know if you have anymore or if any of that is unclear.