This doesn’t have any specific thing to do with OnDemand operationally but more of a general cluster question for the folks that manage them.
We’ve got users that are running python jobs in the cluster that use modules that do threading. (datatable for example). When they submit a job (we are using ondemand for this, jupyter specifically), a number of CPUs are requested, but when the job gets to the node, the python module happily looks at /proc/cpuinfo to determine how many CPUs there are and sets it thread count to that, instead of what slurm has assigned in the cgroup.
The user can do the right thing by modifying their code, for example: dt.options.nthreads = int(os.environ[‘SLURM_CPUS_ON_NODE’]) which is great…but I’m wondering if there’s an environment variable or some other setting that someone knows about that would really make all processes only identify the number of CPUs that are really available in the cgroup and can be set globally.
Thanks for any insight.
If this is too off topic, feel free to delete the post.
No need to delete - it’s perfectly fine to have here and we can support. Though I may say that https://ask.cyberinfrastructure.org/ may be better.
In any case, I know RStudio has a similar issue where R libraries don’t really recognize the cgroup they’re in. I’ve found the nproc reliably returns how many cores you have available.