Shell app dies exactly after 60 seconds of idleness

zackramjan · April 26, 2023, 9:11pm

We have been working to roll out ood on top of our existing slurm cluster.
We are now a rhel9-based shop so are using the ondemand-release-web-3.0-1.noarch.rpm repo.

Everything seemed install easy enough, including the integration with slurm so that we can use the ssh/shell app to our submit node except for one persistant issue:

no matter what we do, after exactly 60 seconds of idleness in the shell app, the connection closes with “Your connection to the remote server has been terminated.” keeping the shell app active with ‘top’ keeps it from disconnecting. We suspect this is the websocket getting unhappy. Here are the things we’ve investigated that have not helped:

we first thought it was ssh, but ssh is not timing out. Our ssh clientalive values are set to handle idleness and we dont see any issues when idling using other clients.

more conclusively, even after the “Your connection to the remote server has been terminated.” in the browser, I can see the process for ssh is still active for some time, so ssh is not dead

zack.ra+   59985   59857  1 16:42 ?        00:00:02 Passenger RubyApp: /var/www/ood/apps/sys/dashboard (production)
zack.ra+   60052   59857  0 16:43 ?        00:00:00 Passenger NodeApp: /var/www/ood/apps/sys/shell
zack.ra+   60095   60052  0 16:43 pts/0    00:00:00 ssh access.hpc.vai.org

We then figured it was an nginx timeout issue. To test this we see that each user gets a generated /var/lib/ondemand-nginx/config/puns/username.conf

within that conf is an
include /var/lib/ondemand-nginx/config/apps/sys/*.conf;
so we added the following to /var/lib/ondemand-nginx/config/apps/sys/shell.conf

location ~ ^/pun/sys/shell(/.*|$) {
  proxy_read_timeout 600s;
  proxy_connect_timeout 600s;
  proxy_send_timeout 600s;
  uwsgi_read_timeout 600s;
  uwsgi_connect_timeout 600s;
  uwsgi_send_timeout 600s;
...

this did not make any difference, including various permutations of the above timeout values. running nginx manually with my personal config:

/opt/ood/ondemand/root/usr/sbin/nginx -c /var/lib/ondemand-nginx/config/puns/zack.ramjan.conf -T

shows that the timeout settings are being read by nginx.

We tried to add a ping to the shell app in /var/www/ood/apps/sys/shell/app.js in hopes that it keeps the websocket alive.

wss.pingInterval = setInterval(() => {
	ws.ping();
},4000);

This just made the app error out after 4 seconds, ie we received the “Your connection to the remote server has been terminated.” right after the ping. Admittedly, I have no idea what I’m doing here, but thought it was worth a shot.

But it also made me wonder if rather than an actual time out, there was some event that was occuring at the 60 second mark that was causing a failure and the session to die.

watching the apache and nginx user logs didnt seem to have anything interesting.

Any advice greatly appreciated as we are looking forward to what ood can mean for our hpc users.

jeff.ohrstrom · April 27, 2023, 1:53pm

I suspect what you’re seeing requires a code change, that there’s no setting for this.

I’ll open and/or find a ticket upstream for the same. But again, I suspect that it’s apache’s 60 second timeout and that we’re not ping/ponging to the server to keep the connection alive.

zackramjan · April 28, 2023, 3:10pm

Ok thanks, we are happy to help with testing etc.

jeff.ohrstrom · April 28, 2023, 3:20pm

I found this ticket which I just scheduled for the 3.1 release.

zackramjan · April 29, 2023, 3:46am

I think I have a fairly simple fix that seems to be working.

create new conf file with the following in the apache conf.d:

cat /etc/httpd/conf.d/proxytimeouts.conf

TimeOut 600
ProxyTimeout 600
KeepAlive On
KeepAliveTimeout 600

this will globally set various apache timeout conf values that seem to prevent 60s disconnects

Its likely that not all of the above are needed. In my few tests, the connection went for longer than 600s before dying (appeared to die after ~1000seconds). I will try to narrow down what config options are actually helping after the weekend.

system · October 26, 2023, 3:46am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SSH keep alive Shell Access on OnDemand Get Help	15	2672	May 26, 2022
SHELL app and websocket inactivty timouts Get Help	5	245	August 19, 2024
SSH Shell - Timing out after 1 minute Get Help	8	236	March 31, 2024
Shell access timeout after 60secs Get Help	18	265	April 22, 2025
noVNC and shell session timeout after 1 minute Get Help ondemand2 , question	6	3475	May 26, 2022

Shell app dies exactly after 60 seconds of idleness

cat /etc/httpd/conf.d/proxytimeouts.conf

Related topics