I was trying to share a compute node between OnDemand and other SLURM queues and realized that OnDemand didn’t schedule jobs on any node that has some running jobs on it. Is that an expected behavior?
What would you recommend on the best practices to allocate resources to OnDemand?
Hi, that shouldn’t be the case. OnDemand calls srun just like in an ssh session or from any other program.
You can check your logs for execv
where you can see the actual command we run. If you’ve configured some jobs (or cluster configuration) to use a particular queue then it’s an issue of that queue and the size of those jobs (in memory and cpu).
In any case, scheduling behavior is determined mostly from the scheduler itself. I’d say check the resources you’re requesting the OnDemand job in question and the queue in question. We’re just submitting jobs, not really determining (or caring) where they go.
In terms of how to allocate resources to OnDemand, I can only speak to OSC where we do this:
- OnDemand can use all queues/clusters that any user could use. Most interactive apps schedule directly in the same cluster & queue all other jobs do.
- Virtual desktops can use what we call the ‘quick’ cluster which is a single node with software compatibility with the regular cluster (for example we have ‘owens’ and ‘owens-quick’). This is so folks can quickly schedule virtual desktops on over subscribed nodes (the job requests 1 core 4 GB ram) and do interactive desktop work at no charge on a shared node. We only use the quick clusters’ for small virtual desktop jobs.
Hope that helps!