This is using OOD 2.0.29 on CentOS 79
This happen on Edge and Chrome browsers.
I’ve tried to look at the logs of the session after we killed the job without success.
Is there a way to increase some log verbosity or which logs should be checked other than the session one ?
Thanks Jeff. In the output.log I can see a single Setting VNC password...
And the password in the URL is the one the connection.yml
The only suspicious messages are these ones
WebSocket server settings:
Listen on :61006
Flash security policy server
No SSL/TLS support (no cert file)
Backgrounding (daemon)
WARNING: no ‘numpy’ module, HyBi protocol will be slower
Traceback (most recent call last):
File “/usr/bin/websockify”, line 11, in
load_entry_point(‘websockify==0.8.0’, ‘console_scripts’, ‘websockify’)()
File “/usr/lib/python2.7/site-packages/websockify/websocketproxy.py”, line 525, in websockify_init
server.start_server()
File “/usr/lib/python2.7/site-packages/websockify/websocket.py”, line 973, in start_server
tcp_keepintvl=self.tcp_keepintvl)
File “/usr/lib/python2.7/site-packages/websockify/websocket.py”, line 741, in socket
sock.bind(addrs[0][4])
File “/usr/lib64/python2.7/socket.py”, line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
Scanning VNC log file for user authentications…
I’m trying to understand where the job_script_content.sh is built so I can add more logs and tests in it. I’ve checked in /var/www/ood/apps/sys/bc_desktop but not all scripts are there.
I’ve an understanding of this issue. This is occuring on a VM being shared across multiple connections, and for which we are controlling the port range being used to only 10
vnc:
min_port: 61001
max_port: 61010
When a session is terminated either after the walltime expired, the session is deleted or the job is killed, there are remaining processes like the ones below
So those are accumulated over the time and at the end are exhausting the number of ports, hence the error message :
socket.error: [Errno 98] Address already in use
so how to clean nicely all processes started by the remote session once the job is finished ? This is using OpenPBS. The most important seems to be the websockify one.
One more element, I don’t see any after.sh or clean.sh in the output session directory, maybe this is where this cleanup code should be added ? If so how ?
I’m not familiar with the PBS setting, but in Slurm I would point you to ProctracType and that you need to set it to cgroup.
PBS may be tracking processes’ it needs to clean by parent process ids. As such, these may not get captured because the parent process of some of this is 1 (systemd). These processes however, should still be a part of the jobs’ cgroup.
So cleaning processes by cgroup is better than by PPID.