SSH keep alive Shell Access on OnDemand

Hello,

Is there timeout setting for Cluster Shell Access on OnDemand? When the session was idle, I got “Your connection to the remote server has been terminated”

Thank you,

Presuming you’re running bash as your default shell, what does

echo $TMOUT

print? On our interactive nodes (regardless of how the user got there), we set TMOUT via a script in /etc/profile.d. A similar setting for csh/tcsh is “autologout”.

Cheers,

Ric

Thanks Ric,

$TMOUT is 0, so it’s disabled. Since users launch it from OpenOnDemand web page, I think it’s using OOD shell client. But, I can’t find any timeout setting.

@gp4r ood-shell does open an actual ssh session, it’s just all the output is being captured and presented through a web page.

From what I can tell from a little research, timeout settings are on the server side. I.e., the server decides when to kick you. You should be able to see your defaults through grep -i alive /etc/ssh/ssh_config.

We use node-pty (https://github.com/microsoft/node-pty/) but I can’t seem to find if you can pass connection options to it. But it does seem reasonable that we could send null characters every so often to keep the connection alive, like putty does, we just don’ currently do that.

@gp4r how long was the session “idle” and can you reproduce it? Was it that you had the shell app open in the web browser, lets say opened a file in vim or just in bash in a subdirectory and left for a coffee break and came back and saw “Your connection to the remote server has been terminated”? I wonder if we could reproduce something similar here at OSC…

@efranz I haven’t checked the time, but for a test, I just let the terminal session idle about 3~4 hours.

@jeff.ohrstrom we have set ClientAliveInterval and ClientAliveCountMax on the server side to keep session so normal ssh connection didn’t terminate. Since I saw it only on OOD shell access, I thought there is timeout in OOD shell.

There are also tools to allow a saved session using the ‘screen’ and ‘tmux’ commands.

@gp4r I’m curious; here at OSC we have a reaper script that periodically kills inactive PUNs using nginx_stage. After the reaper swings by that is the message that I would expect to see with a long running terminal session. Do you have anything like that set up?

@rodgers.355 There is ood cron.

[root@ood-prod cron.d]# cat ood
#!/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
0 */2 * * * root [ -f /opt/ood/nginx_stage/sbin/update_nginx_stage ] && /opt/ood/nginx_stage/sbin/update_nginx_stage --quiet

I know it was set by ood as default. Do you see a problem to increase the interval to every day?

No, but I wonder whether this is the culprit for killing the session, or if Passenger is killing the process itself. I would have expected that as long as the web socket connection was maintained between the browser and the shell app, the nginx_stage nginx_clean command, executed by update_nginx_stage, would not kill the PUN.

Passenger is configured to kill a server process by default after 5 or 10 minutes of inactivity. An active websocket connection is supposed to ensure this stays open.

@rodgers.355 @jeff.ohrstrom I think we should to some testing to determine how to reproduce the problem here at OSC.

Do we really want idle sessions lasting longer than this?

Or is the real problem the fact that a users work is highly coupled with an ssh session? If we can decouple the work from the session - maybe that’s a better route than allowing users to be idle for super long periods of time.

@gp4r We run that ood cron as well (ood 1.5, CentOS 7) and haven’t seen anyone complain about terminal session disconnects.

Ric

image001.png

image002.png

That cron doesn’t kill active terminal sessions.

@rodgers.355 yes, the cron doesn’t kill the terminal session.

Actually, it was due to the network connection. I disconnected my network 1 min and I got different results on different browsers.

Safari : Terminated immediately
FireFox: Terminated about 9 sec
Chrome: Stayed Alive

Do you know which configuration control it on browser?

Thank you

I have opened an issue to track this work: https://github.com/OSC/ood-shell/issues/68

There is indeed a “bug” with the shell app. The current implementation couples the ws connection object with the terminal session object. The result is when a ws connection gets closed, say due to network disconnection, the shell app closes the terminal session.

If these two are decoupled and then the code is updated to enable reconnecting to an existing terminal session via a new websocket connection, then that will ensure that the timeout imposed on a terminal session is at least 5 minutes, or whatever the idle timeout is set for Passenger. Increasing that ultimate timeout is another problem altogether however…