Problems when trying to use job scheduler (OOD 2.0)

Hello everyone,

when I’m trying to submit a job via the Job Composer I do get this error after waiting for 30 seconds.

Failed to submit job
An error occurred when submitting jobs for simulation 1: Unable to run job: got no response from JSV script "/fast/gridengine/mdc/jsv_client".
Exiting.

Any idea why this could be happening? Also is there a way to log where I can see the exact command that ood is using for this job submission?

Best
Nico

Hey sorry for the trouble.

To see error logs for sbatch check the nginx logs:
/var/log/ondemand-nginx/<user>

With the full list of logs given here:
https://osc.github.io/ood-documentation/latest/how-tos/monitoring/logging.html#logging

I’m not sure off the top of my head from that error what is wrong, but seeing some of the nginx errors may help.

Hey thanks for the fast reply,

I’ve taken a look at the logs but have to say that I don’t quite see anything that would be suspicious.
The only error I can see is the one I’ve mentioned before with the JSV script.
I’ve added the log from the moment that I pressed submit in the job composer.

App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (5.6ms)\e[0m  \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" INNER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\" WHERE \"jobs\".\"status\" IN (?, ?, ?, ?)\e[0m  [[\"status\", \"H\"], [\"status\", \"Q\"], [\"status\", \"R\"], [\"status\", \"S\"]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (4.7ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mJob Load (4.6ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 2]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mCACHE Workflow Load (0.0ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mJob Load (4.5ms)\e[0m  \e[1m\e[34mSELECT  \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m  [[\"workflow_id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m  \e[1m\e[34mSELECT  \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m  [[\"workflow_id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 2]]"
App 26076 output: [2023-05-03 11:34:58 +0200 ]  INFO "method=GET path=/pun/sys/myjobs/workflows/2 format=json controller=WorkflowsController action=show status=200 duration=27.16 view=3.75 db=19.44"
App 26076 output: [2023-05-03 11:34:59 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (4.5ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:34:59 +0200 ] DEBUG "\e[1m\e[36mJob Load (4.7ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 2]]"
App 26076 output: [2023-05-03 11:34:59 +0200 ]  INFO "execve = [{}, \"/fast/gridengine/latest/bin/lx-amd64/qsub\"]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] ERROR "An error occurred when submitting jobs for simulation 2: Unable to run job: got no response from JSV script \"/fast/gridengine/mdc/jsv_client\".\nExiting."
App 26076 output: [2023-05-03 11:35:21 +0200 ]  INFO "method=PUT path=/pun/sys/myjobs/workflows/2/submit format=html controller=WorkflowsController action=submit status=302 duration=21746.14 view=0.00 db=9.12 location=https://sl-it-t-ood2.mdc-berlin.net/pun/sys/myjobs/workflows"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (13.6ms)\e[0m  \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" INNER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\" WHERE \"jobs\".\"status\" IN (?, ?, ?, ?)\e[0m  [[\"status\", \"H\"], [\"status\", \"Q\"], [\"status\", \"R\"], [\"status\", \"S\"]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[35mSQL (5.0ms)\e[0m  \e[1m\e[34mSELECT \"workflows\".\"id\" AS t0_r0, \"workflows\".\"created_at\" AS t0_r1, \"workflows\".\"updated_at\" AS t0_r2, \"workflows\".\"job_attrs\" AS t0_r3, \"workflows\".\"name\" AS t0_r4, \"workflows\".\"batch_host\" AS t0_r5, \"workflows\".\"staged_dir\" AS t0_r6, \"workflows\".\"script_name\" AS t0_r7, \"jobs\".\"id\" AS t1_r0, \"jobs\".\"workflow_id\" AS t1_r1, \"jobs\".\"status\" AS t1_r2, \"jobs\".\"job_cache\" AS t1_r3, \"jobs\".\"created_at\" AS t1_r4, \"jobs\".\"updated_at\" AS t1_r5 FROM \"workflows\" LEFT OUTER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\"\e[0m"
App 26076 output: [2023-05-03 11:35:21 +0200 ]  INFO "method=GET path=/pun/sys/myjobs/workflows format=html controller=WorkflowsController action=index status=200 duration=26.21 view=4.94 db=18.59"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (5.1ms)\e[0m  \e[1m\e[34mSELECT \"workflows\".* FROM \"workflows\" INNER JOIN \"jobs\" ON \"jobs\".\"workflow_id\" = \"workflows\".\"id\" WHERE \"jobs\".\"status\" IN (?, ?, ?, ?)\e[0m  [[\"status\", \"H\"], [\"status\", \"Q\"], [\"status\", \"R\"], [\"status\", \"S\"]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mWorkflow Load (4.7ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mJob Load (4.4ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 2]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mCACHE Workflow Load (0.0ms)\e[0m  \e[1m\e[34mSELECT  \"workflows\".* FROM \"workflows\" WHERE \"workflows\".\"id\" = ? LIMIT ?\e[0m  [[\"id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mJob Load (4.6ms)\e[0m  \e[1m\e[34mSELECT  \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m  [[\"workflow_id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m  \e[1m\e[34mSELECT  \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ? ORDER BY \"jobs\".\"id\" DESC LIMIT ?\e[0m  [[\"workflow_id\", 2], [\"LIMIT\", 1]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ] DEBUG "\e[1m\e[36mCACHE Job Load (0.0ms)\e[0m  \e[1m\e[34mSELECT \"jobs\".* FROM \"jobs\" WHERE \"jobs\".\"workflow_id\" = ?\e[0m  [[\"workflow_id\", 2]]"
App 26076 output: [2023-05-03 11:35:21 +0200 ]  INFO "method=GET path=/pun/sys/myjobs/workflows/2 format=json controller=WorkflowsController action=show status=200 duration=24.69 view=4.18 db=18.76"

Best
Nico

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.