We’ve been having his issue with our OnDemand instance (2.0.10) for a while and haven’t been able to find a solution yet. We’ve tried a few things on our end without luck.
Basically, the dashboard seems to take a very long time to load:
The actual authentication part does not seem to be the issue as that happens within a matter of seconds. This was an issue back when we were using LDAP auth and also now with ADFS. It just seems to sit there forever waiting to initialize the dashboard. Common for it to take 30 seconds to a minute or more.
I saw a topic in OnDemand’s known issues regarding slow dashboard performance due to ERB logic. The only thing that’s really changed is adding interactive apps such as Jupyter and JupyterLab. To test I backed those up, removed them from the apps directory, restarted OnDemand and the slowness issue is still there.
The only thing that we notice a difference with is that newer accounts tend to take less time to load than established accounts. Meaning if someone is a member of multiple groups in AD, that seems to tack on the amount of time it takes to load the dashboard. Not sure why this occurs though/relevance or what is happening in the background.
Clicking on Job Composer, etc once you’re in yields the same issue:
We’ve been stuck on this one for quite some time. Any ideas are really appreciated.
I’d turn mod_auth_mellon’s loglevel to maybe debug (I think that’s the right module name for Shibboleth ADFS) and it’ll tell you how long it takes the AD to respond.
If ADFS is responding fairly quickly, then it’s the PUN actually starting up that takes a long time. Hopefully ADFS is slow, because that is a much easier issue to solve, or at least triage, because you can see directly in the apache logs that an ADFS request is made at time T and the response 25 seconds later.
If you can’t pinpoint the latency to ADFS, then we’re going to have to look into the PUN startup, which you can execute manually.
Here’s the command to start up a PUN for a user named
jeff. Now you can
strace it and/or attach any sort of debugging tools to this command (or even
time as a spot check).
[root@ac4df2f20b1e ~]# /opt/ood/nginx_stage/sbin/nginx_stage pun -u jeff
Here’s the command to clean up that PUN.
[root@ac4df2f20b1e ~]# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -u jeff
Jeff, thanks for the reply. It turns out that sssd needed a bit more tuning and after we implemented some changes this seems to have cut down the loading time to about 10 seconds or so. I appreciate the help.
Can you specify what the SSSD configs you had to tune? May be helpful for other folks who find this topic.
The main ones that made the most difference were:
ignore_group_members = True
ldap_purge_cache_timeout = 0
subdomain_inherit = ignore_group_members, ldap_purge_cache_timeout
We also mounted the sssd cache in RAM via fstab.
We have a rather large AD deployment so this seems to have had quite a bit of impact as far as performance goes. I grabbed these suggestions from a Red Hat article here.