Hi. I’m trying to get the grafana integration working and I’m getting a little stuck.
I see that it requires the use of the OnDemand Clusters dashboard, so I have that installed and have been working to get that functional.
I have a prometheus configuration set up, and it looks like that piece is set up correctly, the nodes are exporting data, and prometheus has that data being stored. I configured the prometheus.yml file to have this for each node:
- targets: [‘clusternodeA.domain.org:9100’]
labels:
role: compute
cluster: mycluster
(this wasn’t documented that I needed to do this, I figured that by looking at the variables in the grafana dashboard). Also, in the documentation, the relabel_configs has [address] in quotes and prometheus (2.27.1) didn’t like that, but taking the ’ ’ out made that work.
Now, the CPU Load and Memory Usage graphs are loading stuff, but CPU usage has no data, it’s looking for node_cpu_load_system which isn’t an item being served up by prometheus, it has node_cpu_seconds_total of various types, but not that particular metric. Is this an incompatibility with the version of node exporter? (I have version 1.1.2)
Also, for the moab graphs, I’m using slurm, but there’s no info in the documentation anywhere on what this is looking for. I’m sure I can make it work for slurm, but what’s the prometheus config in use, or collection information that should be set to get valid data?
Lastly, the Active Job Dashboard in OOD (2.0.8) does not show the integrated graphs. If I expand a job I get a blank screen with the job info and a Detailed Metrics link. If I click on that link, I get the grafana page with the data.
I know that was a lot, but thanks for any help you can offer.