Controlling Working Directory for JobComposer jobs?

Hi,

A question came up. If I understand it correctly, all OOD SLURM jobs get submitted from under $HOME/ondemand/something?

Many if not most of HPC centres have a small /home partition they discourage running jobs off, and a large parallel filesystems like /scratch, /project they want users to use insead of /home.

With OOD, is there a way to control the SLURM working directory for JobComposer?
Thanks!

Grigory Shamov
University of Manitoba

Hi - sorry for not getting to this sooner!

Yes you can change the ‘dataroot’ by setting the environment variable OOD_DATAROOT to what you like in /etc/ood/config/apps/myjobs/env file.

Though you should note that this environment variable is the same for everyone, so you need something like $USER in it like /some/other/path/$USER so that everyone has their own directory.

1 Like

Hello, Jeff.
Hope you are well!

I’ll complicate the previous question a little.
How to change the ondemand/other/paths/ directory that is created in /home/$USER/ upon first login to, for example, ood/other/paths/ ?

Set the same environment variable, only this time in the dashboard’s env file - /etc/ood/config/apps/dashboard/env.

1 Like

Thank you, it almost works))

The situation is as follows.
The following variables are set:
in /etc/ood/config/apps/dashboard/env
OOD_DATAROOT=“/home/$USER/ood/dashboard/”
in /etc/ood/config/apps/myjobs/env
OOD_DATAROOT=“/home/$USER/ood/myjobs/”

Upon successful login, the following directory tree is created automatically
home/$USER/ood/dashboard/batch_connect/cache

When I go to the “Job Composer” page the following directory and a file are created automatically
home/$USER/ood/myjobs/production.sqlite3

But i get an error
image

And the appache log has the following entry:

[Wed Jul 24 11:34:06.492539 2024] [lua:info] [pid 13924:tid 140649684747840] [client 172.17.2.38:59998] req_hostname="web-hpc.frccsc.ru" req_filename="proxy:http://localhost/pun/sys/myjobs" req_is_https="true" log_hook="ood" res_content_type="text/html; charset=utf-8" req_origin="" log_time="2024-07-24T08:34:06.492434.0Z" req_accept="text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" req_is_websocket="false" res_content_language="" remote_user="sdenisov" req_accept_charset="" res_content_length="1266" req_accept_encoding="gzip, deflate, br, zstd" req_handler="proxy-server" req_user_ip="172.17.2.38" req_referer="https://web-hpc.frccsc.ru/pun/sys/dashboard/" req_status="500" req_user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36" req_port="443" req_cache_control="" res_content_disp="" allowed_hosts="web-hpc.frccsc.ru" time_proxy="5.378" res_location="" req_method="GET" res_content_encoding="" req_protocol="HTTP/1.1" req_content_type="" local_user="sdenisov" res_content_location="" req_accept_language="ru-ru,ru;q=0.9,en-us;q=0.8,en;q=0.7" time_user_map="0.002" req_uri="/pun/sys/myjobs" req_server_name="web-hpc.frccsc.ru", referer: https://web-hpc.frccsc.ru/pun/sys/dashboard/

Is there a way to fix this?

You’ll have to check /var/log/ondemand-nginx/$USER/error.log or retrieve the /tmp html file it generated to see what the issue may be.

after getting an error /var/log/ondemand-nginx/$USER/error.log says

App 20875 output: [2024-07-24 17:23:28 +0300 ]  INFO "[b2499400-7237-4b5e-acd0-5c908643310c] method=GET path=/pun/sys/myjobs/ format=html controller=WorkflowsController action=index status=500 error='ActiveRecord::StatementInvalid: Could not find table 'jobs'' allocations=375 duration=1.86 view=0.00 db=0.96"
App 20875 output: [2024-07-24 17:23:28 +0300 ] FATAL "[b2499400-7237-4b5e-acd0-5c908643310c]   \n[b2499400-7237-4b5e-acd0-5c908643310c] ActiveRecord::StatementInvalid (Could not find table 'jobs'):\n[b2499400-7237-4b5e-acd0-5c908643310c]   \n[b2499400-7237-4b5e-acd0-5c908643310c] app/models/workflow.rb:16:in `block in <class:Workflow>'\n[b2499400-7237-4b5e-acd0-5c908643310c] app/controllers/workflows_controller.rb:225:in `update_jobs'"

What you need to do is remove the production.sqlite3 file then navigate to the job composer through the navigation bar. That is, do not go to the URL directly, click on the link in the navigation bar.

Oh and restart the webserver in the Help menu after you remove the file.

Did as you wrote

  • deleted production.sqlite3
  • restarted the webserver through the Help menu
  • navigated to the job composer through the link in the navigation bar

Unfortunately it didn’t help. There is the same error in the log.

OK there could be an issue with using different dataroots for the different applications. Can they both use the same OOD_DATAROOT environment variable?

if you meant to use for example OOD_DATAROOT=“/home/$USER/ood/” both in /etc/ood/config/apps/dashboard/env and /etc/ood/config/apps/myjobs/env
it doesn’t work, errors in the log are the same

You have to be careful with the order of operations here. This error happens when you navigate to the URL directly without going through the dashboards link, so you have to be careful there.

You should also be restarting the webserver everytime you want the apps to pick up the new configurations. Lastly every time it fails you need to be removing the sqlite3 file.

Sorry, I didn’t pay enough attention to your words “This error happens when you navigate to the URL directly without going through the dashboards link”.

The fact is that I did the navigation bar configuration via *.yml file in /etc/ood/config/ondemand.d/
There were lines:

- title: "Custom-1"
    links:
      - title: "Custom-2"
        url: "/pun/sys/dashboard/activejobs"
      - title: "Custom-3"
        url: "/pun/sys/myjobs"
        new_tab: true

As soon as I returned to the standard menu using “- jobs” everything worked as it should.

Moreover, the OOD_DATAROOT variables work with different path values:
in /etc/ood/config/apps/dashboard/env
OOD_DATAROOT=“/home/$USER/ood/dashboard/”
in /etc/ood/config/apps/myjobs/env
OOD_DATAROOT=“/home/$USER/ood/myjobs/”

Thanks for the reply! I was worried for a second something may be up, but that makes sense. If you do wish to use a custom navigation bar for the job composer - this is the URL you should use (it’s the same URL as the default navigation bar link).

url: '/pun/sys/dashboard/apps/show/myjobs'
1 Like

I put this link, everything works. Thank you!

We discussed this topic in great detail, I have only one last question - is “Restart the webserver” in the Help menu equals to “systemctl restart apache2” ?

No. Everyone has their own Per-User-Nginx (PUN) behind apache. You’re bouncing your own nginx instance when you ‘restart the webserver’. So you’re not affecting other users PUNs or affecting other users by bouncing apache2.

1 Like