Simple Question: Execute python code on a LSF submit host

Hi,
So after some digging and a few batch submissions and combing through these forums, this is the fix that needs to be done:
As above Ondemand sets correctly.
“Script location” = SLURM_SUBMIT_DIR = PWD
So for LSF we just need Ondemand to set almost the same thing:
“Script Location” = LS_SUBCWD = PWD.

How can we get this done?
–David

OOD_DATAROOT could be a long term solution, but even then, if you set it to $HOME, you’d still get directories like $HOME/myjobs/projects/34.

Are you still sshing into another machine to submit jobs? If you are, that’s the cause of not finding the file. Because you’ve sshed into another machine, you’re in your $HOME when you submit the job. And since your in your $HOME when you submit the job, that’s where the $CWD of the job is.

Sorry I appear to have had that message queued up all morning.

But yes, SLURM_SUBMIT_DIR = PWD because I am not sshing into another machine and thus changing my PWD.

Hi,
I did change OOD_DATAROOT and it made the path shorter. I put /home/$USER/ondemand in the files:
/etc/ood/config/apps/myjobs/env
/etc/ood/config/apps/dashboard/env
But then open ondemand appended “projects/default/2”
So to recap:
If I ssh to a submit host OR I use the ondemand server as a submit host it doesn’t work. Because ondemand never sets “Script Location” = LS_SUBCWD = PWD.
It will always say “your python/R/etc isn’t found”

And just to be clear, the modules isn’t a problem if I “copy environment”.
–David

This is very confusing to me, because I took a look and confirmed that we change directories into the correct directory when submitting the job. So when we issue bsub we’re in the correct directory.

So if you’re indeed using the ondemand server as a submit host, it really should be setting the correct PWD.

In Slurm, I can query the database and get additional information about the job using sacct. I just ran this job and checked it’s WorkDir and it’s correct.

[4(master)]  pwd
/users/PZS0714/johrstrom/ondemand/src/apps/myjobs/data/projects/default/4
[4(master)]  ls 
main_job.sh  slurm-31225227.out
[4(master)]  sacct -j 31225227 -o WorkDir%100
                                                                                             WorkDir 
---------------------------------------------------------------------------------------------------- 
                           /users/PZS0714/johrstrom/ondemand/src/apps/myjobs/data/projects/default/4 
                                                                                                     
                                                                                                     

https://slurm.schedmd.com/sacct.html

Does LSF have a similar command to sacct with a similar field WorkDir?

Hi,
Yes we do, it’s bacct. Here is the output of a failed job:
Job <29989850>, Job Name , User , Project , Status
, Queue <foo_normal>, Command <#!/bin/bash;#BSUB -q
foo_normal;#BSUB -J pythonjob #LSF Job Name;#BSUB -o pyt
honjob.%J.out #Name of the job output file;#BSUB -e python
job.%J.out #Name of the job error file;### – send notific
ation at start --;#BSUB -B;### – send notification at com
pletion --;#BSUB -N;module load python/3.10;env;pwd;python
–version;python hello.py>, Share group charged
Fri Jun 14 10:48:59: Submitted from host , CWD <$HOME>, Output Fi
le <pythonjob.%J.out>;
Fri Jun 14 10:49:00: Dispatched to , Effective RES_REQ <select[type ==
local] order[r15s:pg] affinity[core(1)*1] >;
Fri Jun 14 10:49:00: Completed .

Accounting information about this job:
Share group charged
CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP
0.07 1 1 exit 0.0717 0M 0M
CPU_PEAK CPU_EFFICIENCY MEM_EFFICIENCY
0.00 0.00% 0.00%

SUMMARY: ( time unit: second )
Total number of done jobs: 0 Total number of exited jobs: 1
Total CPU time consumed: 0.1 Average CPU time consumed: 0.1
Maximum CPU time of a job: 0.1 Minimum CPU time of a job: 0.1
Total wait time in queues: 1.0
Average wait time in queue: 1.0
Maximum wait time in queue: 1.0 Minimum wait time in queue: 1.0
Average turnaround time: 1 (seconds/job)
Maximum turnaround time: 1 Minimum turnaround time: 1
Average hog factor of a job: 0.07 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.07 Minimum hog factor of a job: 0.07
Average expansion factor of a job: 1.00 ( turnaround time / run time )
Maximum expansion factor of a job: 1.00
Minimum expansion factor of a job: 1.00
Total Run time consumed: 0 Average Run time consumed: 0
Maximum Run time of a job: 0 Minimum Run time of a job: 0
Scheduler Efficiency for 1 jobs
Slot Utilization: - Memory Utilization: -

real 11m25.629s
user 11m14.474s
sys 0m10.728s

Did you obfuscate this here? If I’m reading this right - the CWD when the job was submitted was the $HOME? Is the host field here the OOD web server or the submit_host you’re sshing into?

Hi,
Honestly I can’t remember if it was in home. It was probably /home/david/ondemand. In NEW news, I blew away the whole server, installed RHEL 8.10 and ondemand-3.1.4-1.el8.x86_64. I also made the server a submit server for LSF. Let’s called this server vultannew. Let me get back to you on the results. Have a good weekend!
–David

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.