Workflow management sessions in OOD

We have a number of users of workflow managers such as nextflow, and in principle other similar tools thata want to have long-term continuous presence on the cluster to submit and manage workflows.

NERSC has a really nice page that covers workflow managers in general and gives various commentary and tips. They also have a page on nextflow linked there that mentions some settings users can use to avoid overwhelming the scheduler. They also offer access to what they call “workflow nodes” that appear to be older nodes in the cluster that they have dedicated to such tasks.

https://docs.nersc.gov/jobs/workflow-tools/
https://docs.nersc.gov/jobs/workflow/nextflow/

I am curious to know whether anyone has created OOD modules to support any workflow managers, and Pegasys and Nextflow in particular. Are there ways to set up such “workflow management” sessions for long-term use with OOD, perhaps using tools like screen to manage disconnections and reconnections? Any thoughts on this topic?

Stepping over to here from twitter.

I lead the NERSC working group that studies workflow management tool usage on our systems, and I wrote that bit of documentation about Nextflow use on Cori.

The whole area is challenging. There are hundreds of tools available so choice overload happens. Many tools make assumptions about the environment or resource access that may not be available on an HPC system.

Starting with a green field, I would never suggest someone planning to run at scale on our HPC use nextflow, but I also recognize that users come in with existing pipelines, their people have existing tool experience, rewriting that software is expensive, and doing the hacking to make a sub-optimal combination work could be the best outcome.