Off topic: python threading in Slurm (and number of CPUs per job)

groucho64738 · August 5, 2022, 1:34pm

This doesn’t have any specific thing to do with OnDemand operationally but more of a general cluster question for the folks that manage them.

We’ve got users that are running python jobs in the cluster that use modules that do threading. (datatable for example). When they submit a job (we are using ondemand for this, jupyter specifically), a number of CPUs are requested, but when the job gets to the node, the python module happily looks at /proc/cpuinfo to determine how many CPUs there are and sets it thread count to that, instead of what slurm has assigned in the cgroup.

The user can do the right thing by modifying their code, for example: dt.options.nthreads = int(os.environ[‘SLURM_CPUS_ON_NODE’]) which is great…but I’m wondering if there’s an environment variable or some other setting that someone knows about that would really make all processes only identify the number of CPUs that are really available in the cgroup and can be set globally.

Thanks for any insight.

If this is too off topic, feel free to delete the post.

jeff.ohrstrom · August 5, 2022, 3:20pm

No need to delete - it’s perfectly fine to have here and we can support. Though I may say that https://ask.cyberinfrastructure.org/ may be better.

In any case, I know RStudio has a similar issue where R libraries don’t really recognize the cgroup they’re in. I’ve found the nproc reliably returns how many cores you have available.

import system
subprocess.run('nproc')

A quick google search showed 2 things, there’s a bug in python main repo for the same issue
https://bugs.python.org/issue36054
and

people have to write their own thing to detect the same.

github.com/PyCQA/pylint

Improve CPU count detection in cgroup environments and fix CI

PyCQA:main ← DanielNoord:fix-ci-4

opened 02:16PM - 01 Apr 22 UTC

DanielNoord

+69 -5

- [x] Add yourself to CONTRIBUTORS if you are a new contributor. - [x] Add a Ch…angeLog entry describing what your PR does. - [x] If it's a new feature, or an important bug fix, add a What's New entry in `doc/whatsnew/<current release.rst>`. - [x] Write a good description on what the PR does. ## Type of Changes | | Type | | --- | ---------------------- | | ✓ | :bug: Bug fix | ## Description After a rough Friday afternoon I think I have finally found the issue: our CI runs in a docker container. #5809 already raised an issue about this and our approach at determining the number of available CPUs. I have adopted a suggestion to fix this issue from the Python bug report found at https://bugs.python.org/issue36054. I have tested this and `_query_cpu` returns `1` for us in the CI, both on Linux <s>and on Windows</s>. As you will see in this CI run the test marked with the new `needs_two_cores` mark are indeed skipped. This would thus close #5809 as well as fix our CI issues. <s>One issue that results from this is that these tests won't be run on our CI. They will still run locally (I checked this on my own local Mac), but won't be run in the CI. I don't immediately know a solution for this. The Github Actions runners simply don't really allow us to test our approach at parallelisation I think...</s> Edit: I misread and on Windows the jobs do keep running. Perhaps that was just a fluke and it changes based on the runner? That would explain some of the variance we have been seeing. Anyway, seems as if the test will run at least some of the time. So that is probably good enough for now?

So it doesn’t look like there’s any silver bullet here, at least until the actual language updates itself.

groucho64738 · August 5, 2022, 3:39pm

Thanks. I know the developer isn’t too thrilled to hear it, but it is what it is. Maybe one magic day python will just fix itself.

Topic		Replies	Views
RStudio app: availableCores() not reporting Slurm-allocated cores Get Help	5	446	May 26, 2022
SLURM Queue for OnDemand Get Help	2	1117	May 26, 2022
[OnDemand 3.1 / Slurm] Increase Number of Cores & Memory for Jupyter App Get Help question	11	67	March 14, 2025
Limiting number of interactive apps Get Help	3	50	April 15, 2025
SLURM job info in context Get Help	4	527	May 26, 2022

Off topic: python threading in Slurm (and number of CPUs per job)

Related topics