Multiple portal instances pointing to one cluster

Tanvez · September 13, 2024, 5:41pm

Hi,
I am looking into multiple portal instances for our cluster.
Is there an option/configuration to spin up multiple portal instances which interact with a single cluster?
Our current setup is hosted in AWS Parallel Cluster. We have one OOD portal node pointing to the following:
Portal node → Login Node(used for SSH connection) → Head Node → HPC Cluster/Compute Nodes
head_separate_login_node
The main problem we are encountering is the Portal instance will get bogged down when new users are logging in simultaneously. The instance resources(100%CPU utilization and around 70% ram utilization) will get taken up and eventually bring down the instance and thus becoming unresponsive.

Thank you in advanced!

-Vesna

jeff.ohrstrom · September 13, 2024, 5:47pm

There are no settings in OnDemand, but that’s OK. All you need is a load balancer that supports sticky sessions. I.e., when a user is routed to a given instance - they’re always routed to that same instance.

alanc · September 13, 2024, 9:03pm

I’ll also point out you can have completely different front end nodes that people have to explicitly go to via different names. For example, at OSC we have ondemand.osc.edu which is for general client use. We also have a separate Open OnDemand instance at class.osc.edu that is customized for classroom / student use. Clients can technically log into either one since they both connect to identical resources.

jaguillette · September 23, 2024, 1:42pm

What makes the sticky sessions necessary? I’ve seen in other threads that the home directory was mentioned as a limiter, but in our setup we do have a shared EFS volume (Vesna and I work on the same setup).

I’m also wondering what would happen to a user if the portal node running OOD that they’re “stuck” to is unavailable due to problems, and they get routed to a different one.

jeff.ohrstrom · September 23, 2024, 2:02pm

CSRF tokens. Basically when you hit a web FORM (like the ones to submit batch jobs) the app makes sure that the POST request you send has the right token for that request.

So you need to ensure that users see the same instance so they’re passing the right CSRF tokens to it. Otherwise if you get a token from one instance, the other instance won’t recognize it when you pass it to submit a job and it’ll fail.

If they try to submit a job when they get re-routed it’ll likely fail on the second instance. But when they retry it’ll succeed. But this case seems like an edge case because you’d have to fail over right at that same time when the user is trying to submit a job. Unlikely to happen because of the timing, but could.

jaguillette · September 23, 2024, 2:23pm

Okay, that makes sense. Thanks for the speedy reply!

system · March 22, 2025, 2:24pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple Open OnDemand portals, for the same cluster? Get Help	5	415	February 24, 2024
Multiple OOD nodes on a single cluster Get Help question	4	1237	January 3, 2023
Interactive job disappearing when using two OODs Get Help	10	1258	May 19, 2022
Multiple instances issue with interactive apps Get Help	6	229	September 26, 2023
Interactive app sessions "overlap" from distinct slurm controllers Get Help question	6	292	March 21, 2023

Multiple portal instances pointing to one cluster

Related topics