Hey guys,
I guess this is a continuing issue and I’m not entirely sure how to approach debugging. As you know I posted a question regarding Jupyter app failing to start waiting for a server port that we could resolve by clearing out the web browser cache.
My RStudio Server, which was working last week just fine, has stopped working and keeps timing out waiting for a port to open. Our main cluster is split into two main nodes: epyc, and gpu. This is only happening on my “epyc” nodes, it works fine in GPU nodes. It’s the exact same script being run so I’m confused.
I actually managed to solve it. I took a look at all the output.log files and found it was trying to start up on one particular host, epyc012. That host is running a few dozen 2-cpu jobs. I added a --exclude= and jobs are running again.