OOD 3.0.1 and 'undetermined' info when looking at cluster job info

Hi, I’m trying out OOD 3.0.1 and just about have everything working, but I’m noticing when trying to get more detailed job info in the Jobs display, everything shows as ‘undetermined’ when trying to expand the info panel. In the PUN error log, I have this:

App 775340 output: [2023-08-14 07:21:13 -0400 ]  INFO "method=GET path=/pun/sys/dashboard/activejobs/json format=json controller=ActiveJobsController action=json status=200 duration=23.94 view=1.32"
App 775340 output: [2023-08-14 07:22:14 -0400 ]  INFO "execve = [{\"SLURM_CONF\"=>\"/etc/slurm/slurm.conf\"}, \"/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"4165223\"]"
App 775340 output: [2023-08-14 07:22:14 -0400 ]  INFO "invalid date:invalid date"
App 775340 output: [2023-08-14 07:22:14 -0400 ]  INFO "/var/www/ood/apps/sys/dashboard/app/models/active_jobs/jobstatusdata.rb:124:in `parse'\n/var/www/ood/apps/sys/dashboard/app/models/active_jobs/jobstatusdata.rb:124:in `extended_data_slurm'\n/var/www/ood/apps/sys/dashboard/app/models/active_jobs/jobstatusdata.rb:45:in `initialize'\n/var/www/ood/apps/sys/dashboard/app/controllers/active_jobs_controller.rb:79:in `new'\n/var/www/ood/apps/sys/dashboard/app/controllers/active_jobs_controller.rb:79:in `get_job'\n/var/www/ood/apps/sys/dashboard/app/controllers/active_jobs_controller.rb:32:in `block (2 levels) in json'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/mime_responds.rb:214:in `respond_to'\n/var/www/ood/apps/sys/dashboard/app/controllers/active_jobs_controller.rb:25:in `json'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/abstract_controller/base.rb:228:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/rendering.rb:30:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/callbacks.rb:106:in `run_callbacks'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/abstract_controller/callbacks.rb:41:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/rescue.rb:22:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/notifications.rb:203:in `block in instrument'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/notifications/instrumenter.rb:24:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/notifications.rb:203:in `instrument'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/instrumentation.rb:33:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/params_wrapper.rb:249:in `process_action'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/abstract_controller/base.rb:165:in `process'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionview-6.1.7.3/lib/action_view/rendering.rb:39:in `process'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/live.rb:261:in `block (2 levels) in process'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/dependencies/interlock.rb:42:in `block in running'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/concurrency/share_lock.rb:162:in `sharing'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/activesupport-6.1.7.3/lib/active_support/dependencies/interlock.rb:41:in `running'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/live.rb:253:in `block in process'\n/opt/ood/ondemand/root/usr/share/gems/3.0/ondemand/3.0.1-1/gems/actionpack-6.1.7.3/lib/action_controller/metal/live.rb:303:in `block in new_controller_thread'"

Seems like something it doesn’t like about the date that slurm is supplying.

Can you issue this same command from a terminal and see what the output is? I can hop on my side and see if there’s a bug. You can use a different separator. Indeed - you may even be able to just find which are the date fields and get their output. I’ll try the same on my side and see what the time format is.

Also let me know the version of slurm you’re running.

We’re using slurm 22.05.2. Here’s the output of 2 jobs where I ran the squeue command directly. The first job is someone else’s and produces ‘undetermined’ in the job list page in OOD. The second command is a job that I ran, and the info DOES appear in the job list page in OOD.

squeue --all --states=all --noconvert -o %a,%A,%B,%c,%C,%d,%e,%E,%f,%F,%g,%G,%h,%H,%i,%I,%j,%J,%k,%K,%l,%L,%m,%M,%n,%N,%o,%O,%q,%P,%Q,%r,%S,%t,%T,%u,%U,%v,%V,%w,%W,%x,%X,%y,%Y,%z,%Z,%b -j 4165417
ACCOUNT,JOBID,EXEC_HOST,MIN_CPUS,CPUS,MIN_TMP_DISK,END_TIME,DEPENDENCY,FEATURES,ARRAY_JOB_ID,GROUP,GROUP,OVER_SUBSCRIBE,SOCKETS_PER_NODE,JOBID,CORES_PER_SOCKET,NAME,THREADS_PER_CORE,COMMENT,ARRAY_TASK_ID,TIME_LIMIT,TIME_LEFT,MIN_MEMORY,TIME,REQ_NODES,NODELIST,COMMAND,CONTIGUOUS,QOS,PARTITION,PRIORITY,REASON,START_TIME,ST,STATE,USER,UID,RESERVATION,SUBMIT_TIME,WCKEY,LICENSES,EXC_NODES,CORE_SPEC,NICE,SCHEDNODES,S:C:T,WORK_DIR,TRES_PER_NODE
root,4165417,qat-dgx1,24,24,0,NONE,(null),(null),4165417,groupname,1064949987,OK,*,4165417,*,jobname1,*,(null),N/A,UNLIMITED,UNLIMITED,4096M,2-03:09:13,,dgx1,trainingjob.sh,0,normal,dgx,1739,None,2023-08-12T06:47:13,R,RUNNING,user1,260777201,(null),2023-08-12T06:47:12,*,(null),,N/A,0,(null),*:*:*,/clusterdir,N/A
----------------------
squeue --all --states=all --noconvert -o %a,%A,%B,%c,%C,%d,%e,%E,%f,%F,%g,%G,%h,%H,%i,%I,%j,%J,%k,%K,%l,%L,%m,%M,%n,%N,%o,%O,%q,%P,%Q,%r,%S,%t,%T,%u,%U,%v,%V,%w,%W,%x,%X,%y,%Y,%z,%Z,%b -j 4165493
ACCOUNT,JOBID,EXEC_HOST,MIN_CPUS,CPUS,MIN_TMP_DISK,END_TIME,DEPENDENCY,FEATURES,ARRAY_JOB_ID,GROUP,GROUP,OVER_SUBSCRIBE,SOCKETS_PER_NODE,JOBID,CORES_PER_SOCKET,NAME,THREADS_PER_CORE,COMMENT,ARRAY_TASK_ID,TIME_LIMIT,TIME_LEFT,MIN_MEMORY,TIME,REQ_NODES,NODELIST,COMMAND,CONTIGUOUS,QOS,PARTITION,PRIORITY,REASON,START_TIME,ST,STATE,USER,UID,RESERVATION,SUBMIT_TIME,WCKEY,LICENSES,EXC_NODES,CORE_SPEC,NICE,SCHEDNODES,S:C:T,WORK_DIR,TRES_PER_NODE
root,4165493,aos-cl-c33,1,4,0,2023-08-14T15:55:07,(null),(null),4165493,domain.users,14400513,OK,*,4165493,*,test.sh,*,(null),N/A,6:00:00,5:58:12,1024M,1:48,,aos-cl-c33,test.sh,0,normal,shortqueue,16859,None,2023-08-14T09:55:07,R,RUNNING,user2,14444725,(null),2023-08-14T09:55:07,ktest,(null),,N/A,0,(null),*:*:*,/home/user2home,N/A

I do notice that field 7(?) on the other user job shows NONE and on mine it shows a date. So, that could be the cause of the problem…however, on OOD 2.0.32, that doesn’t stop the job info from appearing…something must be more strict now in 3.0

That appears to be the ‘end time’ - %e, so that makes sense it appears as NONE for some jobs and not others…besides that I see other timestamps in the output and they appear in both commands.

I see you marked a solution, though I think you found a bug.

I tracked this down to this commit where we’re looking for N/A and you have NONE. It appears to be a bug on our side.

I’ll file a bug ticket now and try to push an update in 3.0.2.

I think I must have accidentally marked a solution…didn’t mean to.

Oh OK - no issues. I’m trying to track down why it would say NONE or how to get Slurm to say none.

I can go ahead and add it - but I’d like to replicate first.

squeue docs say this about end time - I guess I just have to figure how to submit an invalid job?

EndTime
    The time of job termination, actual or expected. (Valid for jobs only) 

Ok, I think I see the situation that sets that. On the job that I submitted in the earlier example, I submitted it to a queue that had a time limit, so it set something (should have been 6 hours after I kicked it off). I submitted the same test job to one of our queues with no limit, and it set the time as NONE.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.