I am doing some testing of the auto_queues feature and there is some behavior I don’t fully understand.
We have users that can have access to cpu (and cpu partitions), gpu (gpu partitions) or both, based on slurm.
Let’s say I am a user with only cpu access. If I use sinfo, I see only cpu partitions. And on the ondemand remote desktop, I also see only cpu partitions, as intended. Now, if I change my access to only gpu, with sinfo I see only gpu partitions, but if I log back on ondemand remote desktop I still see cpu partitions. Then if I reboot the ondemand node, I can now see only gpu partitions as intended.
So, I am wondering how exactly the partitions are cached between each connections, and how to force it to update? I first though of the file in ~/ondemand/data/sys/dashboard/batch_connect/cache/sys_bc_desktop.json but since I never fully reserved a node with that user, this file doesn’t exist. Plus, when it exists, it only contain one partition in the field auto_queues, not the full list of it.
Hi and thanks for the question! The sinfo command that gathers this up can be seen here in ood_core:
So, there is a call to sinfo -aho %A/%D/%C which is then processed and stored in an array for later access for CPU info. Then there is another call with information about the gpu stored in gres_lines. So, this is the initial place that information is gathered.
On that remote desktop, this command has not rerun and so you just have that original information, whereas when you restarted, it was updated as the PUN initialized. There’s not any kind of cache server or repeated calls to update. It looks to work more by calling on the initial launch and only then, which is then called by OOD dashboard and we use a Rails cache to store this:
And the Rails cache is then used to store this:
And some work with the Configuration as well later in that file:
The short of it is that is all updated on initialization of your PUN.
Whoa, thank you for your very detailed reply and the links to code! So it works as I expected it to work, and the weird part during my tests was more likely due to the propagation time of the change in user’s access. I must have reconnected to ondemand the first time to soon after the command to change the access. I tried again and it takes sinfo a couple of minutes to show the correct information.