Interactive desktop freezing after a few seconds of use

We are using HAProxy to proxy to our Ondemand servers and I am having questions about the interactive desktop freezing up after a few seconds. I did find the ticket that talks about setting up a heartbeat for both the shell application and for the vnc/websockify connection. It seems the shell connection is working fine but I am unsure of the number on the --heartbeat=30. is this time in seconds/minutes/? for the next heartbeat or is it the timeout for a heartbeat. HAProxy wants to drop certain http connections after a certain amount of time and I think this is what is happening. It seems the display is freezing yet they can still type. They can get back into the interactive desktop by exiting the vnc connection and going back into it.

This seems to be intermittent at best. I had it happen to me within about 3 minutes but after closing that and relaunching the desktop I have not had it happen in the last half hour.

Looking at the man page for websockify I see my answer here:

–heartbeat=HEARTBEAT
send a ping to the client every HEARTBEAT seconds

Hi, I suspect you’re right. I’ve created this ticket upstream to enable the heartbeat flag for websockify.

I am not sure if this is working yet. I’ve tried different values for heartbeat an I am still getting freeze up on interactive desktop sessions. I have a terminal open pinging another server and I still get the session freeze.

You may also need --auto-pong.

Yeah I saw that and wondered if I needed that. I’ll give it a try. I am also looking at the values of these timeouts in haproxy.

timeout http-request    50s
timeout queue           1m
timeout connect         10s
timeout client          1m
timeout server          1m
timeout http-keep-alive 30s
timeout check           50s
timeout tunnel	    5m
timeout client-fin      30s

The ones in particular I have changed is http-keep-alive and client-fin the tunnel timeout works for the shell/terminal connections to the submit nodes but the heartbeat modification to the app.js in the shell app fixed that.

I just had one instance where the terminal session stopped pinging but then a few seconds later caught up reporting pings.

Setting those two timeouts to 30s may have done it but I also set the auto-pong. I will continue to test these settings as they are and report back.

This only moved the freeze up out to about 30 minutes. I am wondering if turbovnc is timing out the connection too. I see there is an -idletimeout {t} option to turbovnc but I do not know where I should put that.

The websocket starts on the compute node side what on the ondemand server side is connecting to that socket? I would think that on the ondemand server side (client) would be the side with the pong for the heartbeat on the websockify websocket side.

Can you place the idletimeout in the global turbovnc config file?

where would that be? /opt/Turbovnc/? or someplace else?

Found the settings. There was a different one max-idle-timeout = {t} that is similar but sets the maximum timeout for being idle. I am trying that now.

That didn’t work either.

I found where the -idletimeout is used. It is in the Xvnc server and it defaults to no timeout.
So that was a rabbit hole.

This is interesting. If I use the view only shareable link I can see that the session is still going even though the regular link has frozen up.

I don’t think the websockify heartbeat is working like the heartbeat for the shell connection to the submit servers. For the shell application I do not have to set the ‘timeout tunnel’ variable in HAProxy but if I do not have it for the interactive desktop the desktop stops after the default time of 1 minute for that timeout.

I am wondering if there is a way for the novnc app that gets run from the browser has a way to send a heartbeat or send a pong for the heartbeat on the websockify socket.

We now believe that this had to do with the number of file descriptors for the haproxy process. The per user file descriptors was defaulted to 1024. I have changed this to 65536 in the haproxy conf file. If we do not experience freeze up on the interactive desktop I will set this last reply as the solution to the problem. I am very hopeful because this morning I was able to have a session last for the full 2 hours requested for the job.

1 Like