SHELL app and websocket inactivty timouts

Hi,

When opening a novnc desktop application in OOD 3.0.3 the web browser tab times out after 1 minute of inactivity. I see other threads related to this but none of the suggestions have help me resolve the issue. I set passenger_pool_idle_time: 7200 in nginx_stage.yml and restarted the web server but I am still experiencing the problem. Where should the timeout be set?

Thx,
Brad

Also, running RHEL9 and I see this in /var/log/ondemand-nginx/user/error.log and it could be of interest.

App 2309820 output: [2024-02-15 21:17:14 -0800 ] INFO “execve = [{"SLURM_CONF"=>"/software/slurm/slurm.conf"}, "/usr/bin/squeue", "–all", "–states=all", "–noconvert", "-o", "\u001E%a\u001F%A\u001F%B\u001F%c\u001F%C\u001F%d\u001F%D\u001F%e\u001F%E\u001F%f\u001F%F\u001F%g\u001F%G\u001F%h\u001F%H\u001F%i\u001F%I\u001F%j\u001F%J\u001F%k\u001F%K\u001F%l\u001F%L\u001F%m\u001F%M\u001F%n\u001F%N\u001F%o\u001F%O\u001F%q\u001F%P\u001F%Q\u001F%r\u001F%S\u001F%t\u001F%T\u001F%u\u001F%U\u001F%v\u001F%V\u001F%w\u001F%W\u001F%x\u001F%X\u001F%y\u001F%Y\u001F%z\u001F%Z\u001F%b", "-j", "125646", "-M", "c1"]”
App 2309820 output: [2024-02-15 21:17:15 -0800 ] INFO “method=GET path=/pun/sys/dashboard/batch_connect/sessions.js format=js controller=BatchConnect::SessionsController action=index status=200 duration=245.92 view=17.05”
App 2309820 output: [2024-02-15 21:17:23 -0800 ] WARN “Error opening MOTD at \nException: bad URI(is not URI?): nil”
App 2309820 output: [2024-02-15 21:17:23 -0800 ] INFO “method=GET path=/pun/sys/dashboard/ format=html controller=DashboardController action=index status=200 duration=66.36 view=11.92”

[ N 2024-02-15 22:00:08.8962 2309788/T8 age/Cor/CoreMain.cpp:670 ]: Signal received. Gracefully shutting down… (send signal 2 more time(s) to force shutdown)
[ N 2024-02-15 22:00:08.8962 2309788/T1 age/Cor/CoreMain.cpp:1245 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected…
[ N 2024-02-15 22:00:08.8962 2309788/T1 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 2309820, application /var/www/ood/apps/sys/dashboard (production)
[ N 2024-02-15 22:00:08.8963 2309788/T8 Ser/Server.h:901 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/Ta Ser/Server.h:901 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/Tc Ser/Server.h:901 ]: [ServerThr.3] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/T8 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2024-02-15 22:00:08.8963 2309788/Ta Ser/Server.h:558 ]: [ServerThr.2] Shutdown finished
[ N 2024-02-15 22:00:08.8963 2309788/Tk Ser/Server.h:901 ]: [ServerThr.7] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/Tm Ser/Server.h:901 ]: [ServerThr.8] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/Te Ser/Server.h:901 ]: [ServerThr.4] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8963 2309788/Tk Ser/Server.h:558 ]: [ServerThr.7] Shutdown finished
[ N 2024-02-15 22:00:08.8963 2309788/Ti Ser/Server.h:901 ]: [ServerThr.6] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8964 2309788/Tm Ser/Server.h:558 ]: [ServerThr.8] Shutdown finished
[ N 2024-02-15 22:00:08.8964 2309788/Te Ser/Server.h:558 ]: [ServerThr.4] Shutdown finished
[ N 2024-02-15 22:00:08.8963 2309788/Tc Ser/Server.h:558 ]: [ServerThr.3] Shutdown finished
[ N 2024-02-15 22:00:08.8964 2309788/Ti Ser/Server.h:558 ]: [ServerThr.6] Shutdown finished
[ N 2024-02-15 22:00:08.8963 2309788/Tg Ser/Server.h:901 ]: [ServerThr.5] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8964 2309788/Tg Ser/Server.h:558 ]: [ServerThr.5] Shutdown finished
[ N 2024-02-15 22:00:08.8964 2309788/To Ser/Server.h:901 ]: [ApiServer] Freed 0 spare client objects
[ N 2024-02-15 22:00:08.8964 2309788/To Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2024-02-15 22:00:08.8965 2309788/T1 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 2309820, application /var/www/ood/apps/sys/dashboard (production)
[ E 2024-02-15 22:00:09.4187 2309788/T1 age/Cor/TelemetryCollector.h:454 ]: Error contacting anonymous telemetry server: OpenSSL SSL_connect: Connection reset by peer in connection to anontelemetry.phusionpassenger.com:443
[ N 2024-02-15 22:00:09.5099 2309788/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished

3.1 - being released today, has better shell connectivity. I don’t think there’s anything you can do about this in 3.0 or below.

Thanks, I will update to version 3.1. Also, do you have any suggestions for batch_connect (Linux XFCE Desktop) sessions that are terminating with an exit status of 0 after 48 hours when the slurm job requests 75 hours. A normal slurm job launching a bash shell does not have this problem and lasts the full 75 hours. TurboVNC version is turbovnc-3.0.91-20230818.x86_64 and stock default settings for RHEL9. Could it be the login/head node running a job or should I be looking at the compute node? How would you troubleshoot this type of problem.

Not sure if this is your issue, but I found that mod_proxy by default will time out connections between a vnc client and the frontend, causing the web client to get a “blank” screen with a NoVNC reconnect button (that won’t work because of the one-time password)

This can be fixed by increasing the timeout, for example:
echo "ProxyTimeout 3600" > /etc/httpd/conf.d/proxytimeout.conf
… and restart httpd.

As for the job dying after 48 hours, I think the output.log of the session is your best bet for finding information.

1 Like