Clean script in "post execution" in LSF

fenz · December 7, 2022, 9:32pm

Hi!

I would like to execute the clean script for an OOD interactive app even when the job is killed because of hitting the walltime.
I know that’s the expected behaviour (Render Template — Open OnDemand 2.0.13 documentation) and there’s a suggestion to look at the “timeout” field of the batch connect which I don’t get.
In our case we run LSF scheduler and I was thinking to add the clean script as “post-execution” of the job. So I was looking for a way to get the “stage root” directory in the submit.yml.erb since I will need to add an option like: -Ep STAGE_ROOT/clean.sh but I’m not sure if this is already available or created afterwards.
At the moment I’m exploiting an env variable created by LSF (LS_EXECCWD) which seems to point to the stage root so I can pass something like: -Ep ${LS_EXECCWD}/clean.sh but I would like to do it with OOD variables.
Does anyone have the same need for the clean.sh? How can I solve it?
Thanks for your help!

travert · December 8, 2022, 2:37pm

Hey sorry, rereading this I think you will need something similar to SLURM’s Epilogues.

OOD won’t have control of the scheduler when it hits the wall time, so this will need to be something that the actual scheduler does. We use epilogues at OSC for SLURM to handle these types of issues where a cleanup happens when the walltime is hit. So if there’s an analog to SLURM’s epilogues that would be the way to go.

I know it’s not what you need, but here’s SLURM’s epilogue guide:
https://slurm.schedmd.com/prolog_epilog.html

fenz · December 11, 2022, 3:28pm

Hi,

I maybe express myself badly. LSF does have something (option -Ep) the thing is that this option needs the “path” of the scripts to execute (clean.sh).
The issue I have is to access the “staged root” path to specify the clean.sh like:
bsub -Ep $STAGED_ROOT/clean.sh.
Is it possible in the “submit.yml” to access a variable which points to that folder? Or that’s a folder created only afterwards?
Based on what you commented, is there an example of how SLURM’s epilogue is used to run the clean script?

travert · December 12, 2022, 3:37pm

Ok, for SLURM that epilogue command is in the slurm.conf file itself, so OOD is not doing anything. SLURM just knows by that config what to run and do if the time is hit for the job, which the scheduler is keeping track of.

Is there a configuration file that you can set for LSF telling it where this epilogue script lives? The idea of the epilogue is the scheduler is handling this, so this wouldn’t be something set in OOD.

fenz · December 12, 2022, 9:19pm

Ok but every app has his own “clean.sh” script and would like to call that one. A generic epilogue is configured for all jobs so I feel that’s a bit too invasive.
Each time you start an app in OOD it creates a folder (staged root if I got it right) and copies there the clean.sh script. If I can get a variable pointing to that path I solved all my problems.
Would it be possible?

fenz · December 12, 2022, 9:22pm

And yes, LSF has an option to tell where the epilogue script lives. If I know the path where the clean.sh is I can use that one as path for the LSF option

travert · December 13, 2022, 2:15pm

I think I see what you mean finally, thanks for the patience.

You are correct about the staged_root being something you can access for the job when it runs with a clean.sh file then being used in that location to be run for the scheduler, but only running per job and not system wide which took me a bit to see, and this is largely going to be the submit.yml.erb that can take care of this with something like:

script:
  native:
    - '-Ep'
    - "<%= context.staged_root %>/clean.sh"

I need to play with this some myself but we have a downtime at the moment. I’ll post back more this afternoon.

fenz · December 13, 2022, 2:56pm

Yes, I mean batch connect apps. So for example. Let’s take Rstudio, in the template folder of the app you put all the scripts you need like, for example, the clean.sh: bc_osc_rstudio_server/script.sh.erb at master · OSC/bc_osc_rstudio_server · GitHub
This script is only for RStudio, you maybe created a temp folder using a command in the “before.sh” script of the RStudio app and want to clean it using the clean.sh. The folder maybe needed only for Rstudio, you don’t have it in Jupyter and you don’t clean it in the “clean.sh” script of Jupyter app. So this is what I mean with “every app has is own clean script”.
The problem is this clean script runs when the “main” script exits but not when the job hits the walltime (as you also confirmed in the last comment).
I have a way to call it from LSF, in the “native” section of submit.yml (bc_osc_rstudio_server/submit.yml.erb at master · OSC/bc_osc_rstudio_server · GitHub) I can add:
…
native:

-Ep STAGE_ROOT/clean.sh
…

And the cleanup will be called by LSF when the job is killed (like the epilogue in slurm). So all LSF configuration are fine, the option is there and all the “work” will be done by the scheduler. All good.
My only issue is I don’t have the “STAGE_ROOT” path, I don’t know what to use as “STAGE_ROOT” since I’m not sure I have access to that folder in the “submit.yml”.
STAGE_ROOT is created by OOD when submitting the app, it is the folder in your HOME (like ~/ood/data/…/rstudio ) where all the scripts are copied. I can access this folder in other scripts (bc_osc_rstudio_server/script.sh.erb at master · OSC/bc_osc_rstudio_server · GitHub) but not in the submit.
I hope I could explain it more clear now.
Many thanks for your help!

fenz · December 13, 2022, 2:58pm

sorry, was writing the answer in the meantime. That’s exactly what I mean

travert · December 13, 2022, 3:00pm

Yep! I see what you mean, thanks again for the patience. I think what you are after here is that ERB in the line above to get that variable with context.staged_root (we use the context object to, well, get context in OOD) so that should dynamically set that path for you when you run the.

fenz · December 13, 2022, 3:07pm

That’s what I tried in the past and just to be sure I tried again now:

undefined local variable or method `context’ for #BatchConnect::SessionContext:0x0000559ca0a4b568

- The RStudio Server session data for this session can be accessed under the staged root directory.

So I though the path was created afterwards and not available in the “submit” but I saw it correctly “linked” in the error message

fenz · December 13, 2022, 3:14pm

Based on the finding about the link be shown in the error, I saw it comes from here: ondemand/new.html.erb at v2.0.29 · OSC/ondemand · GitHub .
So maybe I can use this OodAppkit to get the path. I’ll give a try

fenz · December 14, 2022, 8:49am

Not sure you were waiting for my answer. I failed to use other “class” or instance variables to access that info. So, if you have anything I can try just let me know.

jeff.ohrstrom · December 14, 2022, 2:50pm

Sorry it should be <%= staged_root %> directly with no context.

fenz · December 14, 2022, 3:30pm

This works! Many thanks both for your support!
Is it possible to disable execution of clean.sh? I mean now it will always be called by LSF so I don’t need to have it run by OOD. I think I can just rename the file but was wondering if there’s an “official” way of doing it.

jeff.ohrstrom · December 14, 2022, 4:11pm

Renaming the file is just fine. Can’t say it’s ‘official’ but it’s the easiest and makes the most sense, so it’s fairly close.

fenz · December 14, 2022, 5:01pm

So I was close as well. Many thanks!

fenz · December 15, 2022, 10:32am

So for completeness (in case needed by anyone else). My final solution is to add the “-Ep” option to LSF in the submit.yml:
…
script:
queue_name: “interactive”
native:
- -Ep “”<%= staged_root %>/post.sh >> <%= staged_root %>/output.log 2>&1""
…
So the script I called ‘post.sh’ and attach its output (both out and error) to the output.log.
The main difference with the “clean.sh” is that this script will not be executed in the same “environment” so the env variables exported in “before.sh”, “script.sh” or others are not available in the “post.sh”. My solution was to “build” the post.sh in the “before.sh” (I ‘echo’ the commands with right values to the post.sh).
Thanks again all for your help.

fenz · December 15, 2022, 10:35am

@jeff.ohrstrom I just figured out I didn’t mention LSF in the title so probably adding “in LSF” would be a good idea. I can’t change it but maybe you can.

jeff.ohrstrom · December 15, 2022, 2:35pm

I changed the title. Thanks!

Topic		Replies	Views
SLURM job info in context Get Help	4	527	May 26, 2022
Question about workflow of the interactive app Feature Requests and Roadmap Discussion question	1	520	July 22, 2022
Submit to scheduler on seperate node from OndDemand node Get Help	5	687	May 26, 2022
Interactive jobs started inside Interactive Desktop don't export environment variables (Slurm) Get Help question	8	2282	September 10, 2022
Ondemand with slurm based sytems, sbatch? Get Help	16	3961	May 26, 2022

Clean script in "post execution" in LSF

undefined local variable or method `context’ for #BatchConnect::SessionContext:0x0000559ca0a4b568

Related topics