Show Projected Start time on Interactive Sessions page

When submitting jobs through OOD, many of our users are confused about what to do when they observe a job in the “Queued” state for a long time.

It would be helpful to have the Projected Start Time (from squeue –start output) shown in the card for the job.

This could take the same spot where “Time Remaining” is for running jobs.

How would you deal with the edge cases, like start is dependent on something? Many of the jobs “just queued” are waiting on a resource and have N/A in the START_TIME column; e.g.

          14439049  standard st_archi     <username> PD                 N/A      1 (null)               (DependencyNeverSatisfied)

Would you have OOD display the “DependencyNeverSatisified” string as the Projected Start Time?

We’re also interested in this. It’s on my TODO list to try to implement something locally, but currently have no idea when I’ll have time to try. I feel like I’ve seen other recent discussions about this.

Ric, we’re especially concerned with edge cases. While we’ve tried to prevent it with checks on the forms, users submitting a job that can never start and no way of seeing that from the interactive sessions page is a frequent issue for us.

I think it’s implementation + docs issue. Basically, show users whatever slurm says and then a link to docs explaining what that means. How many jobs have a predicted start time is also highly dependent on local Slurm confs. I think I would add “eligible” times in as well.

1 Like

see also: Custom notices on session card while a job is starting

Here’s a start. Tighter integration into the cards would be more work, but this lets you show some extra info to users without any significant changes. This would be a in a custom info.md.erb on a per app basis.

<%-
require 'open3'

class CheckJob

  @cache = ActiveSupport::Cache::FileStore.new("/users/#{User.new.name}/.cache/OpenOnDemand/", :expires_in => 60.seconds)

  def self.CheckJob(job_id)
    begin
      # get job info from squeue
      script = "/hpc/sys/apps/slurm/current/bin/squeue -j " + job_id + " -ho '%T,%S'"
      o, status = Open3.capture2e(script)
      tmp_output = o.split("\n")
      output = tmp_output[0].split(',')
      return output
    end
  end

  def self.GetJobState(job_id)
    begin
      @GetJobState = @cache.fetch("#{User.new.name}/queues/" + job_id, race_condition_ttl: 30.seconds) do
        self.CheckJob(job_id)
      end
      return @GetJobState
    end
  end
end

def valid_time_string?(time_string)
  begin
    Time.parse(time_string)
    true
  rescue ArgumentError
    false
  end
end

-%>

<%- if queued? -%>
> **Job Status**: <%= CheckJob.GetJobState(job_id)[0] %>

> **Predicted or Actual Start Time**: 
<%- if valid_time_string?(CheckJob.GetJobState(job_id)[1]) -%>
<%= Time.parse(CheckJob.GetJobState(job_id)[1]).strftime("%B %d, %Y at %I:%M %p") %>
<%- else -%>
Unknown
<%- end -%>

> For an explanation of the job status values, see https://slurm.schedmd.com/squeue.html#SECTION_JOB-STATE-CODES

> Predicted start times are based on job requests currently in the queue. They are not available for all jobs and
are often not accurate. Jobs usually, but not always, start before the predicted time. 
<%- end -%>



There’s probably better ways to do that and you may need to modify how I setup the cache, path to slurm commands, etc.

Looks like:

2 Likes

Thanks for this!

Instead of hardcoding the cache path, I changed it so it uses ~/ondemand/.cache:

  @cache = ActiveSupport::Cache::FileStore.new(
    File.join(ENV["HOME"], "ondemand", ".cache"),
    expires_in: 60.seconds
  )
1 Like