You’re fine. My explanation was a bit wrong, so I went ahead and tested option 1 and confirmed the behavior.
Here is the flow of finding a job and querying for it, so please refer back to it if needed:
When OOD finds a job with cluster_id = rka
(on any site, rid, rka, my cluster at OSC, wherever) it will attempt to create an adapter for this cluster id (rka). It looks for a file called rka.yml
(because the cluster_id was rka. cluster_id is the filename. The filename is the cluster_id. This is true both when you create the job and when you go back to query for it.) and tries (tries!) to create the library.
(option 1)
If it can’t find the cluster configuration file (like you’ve logged into rid
and you can’t find the rka
configuration) it’ll get confused and create a panel for this job in an “Undetermined State”. It has a delete button, but it won’t work and it says to contact support. OOD can’t delete the job because it doesn’t know how to. On rid
it has no idea how to interact with the rka
cluster, if it’s SLURM or Torque or whatever.
(option 2).
If it does find the file rka.yml
it’ll read the configuration and use it.
(option 2 - bad)
Since this is LSF, it look for v2.job.cluster
to be populated. If it’s not populated, it won’t use the -m option. This is problematic because it can successfully execute the bjobs command and LSF says “that job doesn’t exist” (because you end up querying rid
for an rka
job), so it deletes it.
(option 2 - good)
Since this is LSF, it look for v2.job.cluster
to be populated. If it is, it’ll use it as the -m
argument when running bjobs
. If the rka.yml
file has v2.job.cluster: "rka"
it will submit a bjobs
command with -m rka
. This means you’ll be able to view RKA jobs on RID.