When does vnc_clean in cluster.d run?

RonRahaman · September 1, 2023, 2:58pm

Hi all!

We need to cleanup the stray X11 locks and sockets after our users’ Interactive Desktop jobs end. I saw some helpful info about using the vnc_clean hook in clusters.d:

My question is: When does the vnc_clean hook run? I assumed it ran at some point after the VNC session ended. But I’m running some tests, and looks like during an active VNC session, the hook has already run.

Thanks,
Ron

jeff.ohrstrom · September 1, 2023, 3:10pm

If you look in job_script_content.sh you’ll find it here (this is the default). Which means it runs before it attempts to clean them up before it launches a new vncserver.

echo "Starting VNC server..."
for i in $(seq 1 10); do
  # Clean up any old VNC sessions that weren't cleaned before
  vncserver -list | awk '/^:/{system("kill -0 "$2" 2>/dev/null || vncserver -kill "$1)}'

That said - if you look at those lock files they’re owned by users individually. So when I launch my job, I’m only able to clean my old lock files.

As that other topic indicates - you’ll need some root privileges to delete all lock files.

jeff.ohrstrom · September 1, 2023, 3:11pm

Seems like you could use a Slurm prologue (if you run Slurm) to do this work. Either as the user and clean their own lock file or as root and clean all (though you’d have to check to be sure what you’re removing isn’t currently running).

RonRahaman · September 1, 2023, 3:35pm

Thanks Jeff, it’s good to know that’s a default behavior now.

RonRahaman · September 1, 2023, 3:54pm

I think our issue is that a given user doesn’t land Interactive Desktop jobs on the same node often enough. So I think the locks from previous jobs aren’t getting cleaned up very often.

Is there a reliable way to do the cleanup at the end of a job? I understand that clean.sh doesn’t always run, which makes sense given all the ways a job can be interrupted.

jeff.ohrstrom · September 1, 2023, 4:01pm

Slurm has facilities for doing things after jobs complete. There are similar things for other schedulers.

https://slurm.schedmd.com/prolog_epilog.html

More complex options could be using FUSE to mount a temporary file system that’ll get clean up when the processes/job completes (I think it’ll be clean up - I’m not 100% sure on FUSE mounts, never actually used them, just know they exist)

RonRahaman · September 1, 2023, 4:21pm

We currently do some cleanup for other purposes in the Slurm epilogue. All our current cleanup is for job allocations, and hence runs as slurmd user. To use vncserver -list for cleanup, it seems like we would need to do it in parts of the epilog that run as the job’s user?

jeff.ohrstrom · September 1, 2023, 4:27pm

Seems like it. If the user can clean up their own lock files then over time you’d have 0 lock files that aren’t in use. They’re in /tmp after all, so a reboot on the compute node may get rid of them all. Like I say, over time, with maintenance windows restarting your compute nodes and an epilogue that can clean up what the user has created you should be in good shape.

RonRahaman · September 1, 2023, 4:46pm

Thanks again. We’ve decided we’ll start cleaning out the old locks a bit more frequently than we reboot the nodes, and that should work fine for us. It’s not as if /tmp fills up; we just run out of ports.

We only saw this on our instructional cluster now, at the beginning of the semester. The other research clusters are fine. It’s probably just a weird usage pattern that happening with the students at the beginning of this semester.

system · February 28, 2024, 4:47pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lock files for VNC-based apps not being cleaned up Get Help	6	3575	May 26, 2022
Lbnl-nhc check_ps_userproc_lineage check will kill interactive VNC jobs General Discussion	1	746	May 11, 2022
Rstudio server tmp directory cleanup Get Help	5	1899	May 26, 2022
OOD launching desktop on head node, not compute node Get Help	6	260	April 21, 2024
VNC screenshot from command line Feature Requests and Roadmap Discussion feature-request	4	706	January 24, 2022

When does vnc_clean in cluster.d run?

Related topics