Help troubleshooting poor dashboard load performance

Hello,

We are seeing some performance issues with loading the dashboard. I know this issue has been discussed before, so I’ll do my best to cover all the bases.

Environment:
OOD 3.0.1
RHEL 8.5
CAS for auth with AD backend via SSSD

Issue:
Loading the dashboard with a PUN already running takes <1s (~300-800ms) whereas if a PUN is not already running, or when restarting the web server, the dashboard takes 4-16s to load – about an order of magnitude slower. I don’t have a large sample size of test cases, but anecdotally, accounts with sandbox development enabled seem to be on the longer end of both ranges, i.e. closer to 1s with a PUN running and 8-16s when restarting the web server.

What I’ve tried so far:
I’ve read most of the existing threads on this issue. In no particular order:

I’ve also attempted to debug the process by manually stopping and restarting the PUN from the terminal on the OOD server:

/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -u ndusek
/opt/ood/nginx_stage/sbin/nginx_stage pun -u ndusek

In those cases, load times are <1s, consistent with a reload of the dashboard when the PUN is already running.

So at this point, it seems to be that the dashboard is doing something that takes longer if a PUN is not already running, and that the time seems to increase if you have more apps to serve. In other words, I’m wondering, with a new PUN, does the dashboard have to enumerate all the apps a user has access to, but after that initial load, assets/apps are cached and served more quickly?

I hope what I’m saying makes sense or rings a bell with the OOD developers. Let me know if I can provide any more information.

Thanks,
Nick

Another data point:

We found that clearing out $OOD_DATAROOT/sys and $OOD_DATAROOT/dev reduced the load time for accounts with the slowest responses from 8-16s down to ~4s. So that accounts for the difference between “slow” and “really slow”.

This implies that more active users running interactive apps more often will have slower load times. What are some ways to address this? I seem to remember a setting that sets the card lifetime for past interactive jobs, but I don’t know if this actually clears out the output directories for the sessions, does it?

Also, that still leaves unaccounted the +3s delay when restarting the web server or reloading the dashboard after several minutes of inactivity. Any idea where that could be coming from?

Nick

Hello and thanks for the info!

We aren’t sure off the top of our heads what this might be and will have to check into this more.

I’ve opened an issue for this for us to track and dig into what is happening though. Sorry not to have a better answer right away. The issue can be seen here. Please feel free to add any more information and data you see fit.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.