Hi,
I have a close to working ondemand dev environment setup, but am having issues getting the GUI job submissions, and GUI desktop environment jobs to work. They both fail with something about failing to contact the slurm controller, or that the cluster wasn’t set.
I successfully tested sending jobs via the terminal with the test workflow here:
https://osc.github.io/ood-documentation/latest/installation/resource-manager/test.html
And confirmed that worked successfully by reading in the slurmctld logs on the controller node that the job worked.
Is there some extra step I am missing to get the job composer to work? Some other log I should be looking at?
Thanks,
Miles
v2:
metadata:
title: "plan9"
login:
host: "exadev2.ohsu.edu"
job:
adapter: "slurm"
bin: "/usr/bin/"
conf: "/etc/slurm/slurm.conf"
App 5736 output: [2023-09-12 17:12:07 -0400 ] DEBUG "[562dbe76-2c56-495c-b522-277c12f06bfb] \e[1m\e[36mWorkflow Load (2.4ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m [[\"id\", 1], [\"LIMIT\", 1]]"
App 5736 output: [2023-09-12 17:12:07 -0400 ] DEBUG "[562dbe76-2c56-495c-b522-277c12f06bfb] \e[1m\e[36mJob Load (1.8ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m [[\"workflow_id\", 1]]"
App 5736 output: [2023-09-12 17:12:07 -0400 ] INFO "[562dbe76-2c56-495c-b522-277c12f06bfb] execve = [{\"SLURM_CONF\"=>\"/etc/slurm/slurm.conf\"}, \"/usr/bin/sbatch\", \"-A\", \"acc\", \"--export\", \"NONE\", \"--parsable\"]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] ERROR "[562dbe76-2c56-495c-b522-277c12f06bfb] An error occurred when submitting jobs for simulation 1: sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)"
App 5736 output: [2023-09-12 17:12:16 -0400 ] INFO "[562dbe76-2c56-495c-b522-277c12f06bfb] method=PUT path=/pun/sys/myjobs/workflows/1/submit format=html controller=WorkflowsController action=submit status=302 duration=9014.72 view=0.00 db=4.21 location=https://openondemanddev.ohsu.edu/pun/sys/myjobs/workflows"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[12680650-8366-4e51-b025-bc43c7a2bbc5] \e[1m\e[36mWorkflow Load (2.0ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" INNER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\" WHERE \"jobs\".\"status\" IN (?, ?, ?, ?)\e[0m [[\"status\", \"H\"], [\"status\", \"Q\"], [\"status\", \"R\"], [\"status\", \"S\"]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[12680650-8366-4e51-b025-bc43c7a2bbc5] \e[1m\e[35mSQL (1.8ms)\e[0m \e[1m\e[34mSELECT \"workflows\".\"id\" AS t0_r0, \"workflows\".\"created_at\" AS t0_r1, \"workflows\".\"updated_at\" AS t0_r2, \"workflows\".\"job_attrs\" AS t0_r3, \"workflows\".\"name\" AS t0_r4, \"workflows\".\"batch_host\" AS t0_r5, \"workflows\".\"staged_dir\" AS t0_r6, \"workflows\".\"script_name\" AS t0_r7, \"jobs\".\"id\" AS t1_r0, \"jobs\".\"workflow_id\" AS t1_r1, \"jobs\".\"status\" AS t1_r2, \"jobs\".\"job_cache\" AS t1_r3, \"jobs\".\"created_at\" AS t1_r4, \"jobs\".\"updated_at\" AS t1_r5 FROM \"workflows\" LEFT OUTER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\"\e[0m"
App 5736 output: [2023-09-12 17:12:16 -0400 ] INFO "[12680650-8366-4e51-b025-bc43c7a2bbc5] method=GET path=/pun/sys/myjobs/workflows format=html controller=WorkflowsController action=index status=200 duration=8.11 view=2.43 db=3.83"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mWorkflow Load (2.0ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" INNER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\" WHERE \"jobs\".\"status\" IN (?, ?, ?, ?)\e[0m [[\"status\", \"H\"], [\"status\", \"Q\"], [\"status\", \"R\"], [\"status\", \"S\"]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mWorkflow Load (1.7ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m [[\"id\", 1], [\"LIMIT\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mJob Load (1.7ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m [[\"workflow_id\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mCACHE Workflow Load (0.0ms)\e[0m \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m [[\"id\", 1], [\"LIMIT\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mJob Load (1.6ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m [[\"workflow_id\", 1], [\"LIMIT\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mCACHE Job Load (0.0ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m [[\"workflow_id\", 1], [\"LIMIT\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] DEBUG "[42c31484-8679-4c27-92ad-d8cd2fbed03b] \e[1m\e[36mCACHE Job Load (0.0ms)\e[0m \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m [[\"workflow_id\", 1]]"
App 5736 output: [2023-09-12 17:12:16 -0400 ] INFO "[42c31484-8679-4c27-92ad-d8cd2fbed03b] method=GET path=/pun/sys/myjobs/workflows/1.json format=json controller=WorkflowsController action=show status=200 duration=12.29 view=2.73 db=6.99"