<App> failed to start waiting for port

Hey guys,
I guess this is a continuing issue and I’m not entirely sure how to approach debugging. As you know I posted a question regarding Jupyter app failing to start waiting for a server port that we could resolve by clearing out the web browser cache.

My RStudio Server, which was working last week just fine, has stopped working and keeps timing out waiting for a port to open. Our main cluster is split into two main nodes: epyc, and gpu. This is only happening on my “epyc” nodes, it works fine in GPU nodes. It’s the exact same script being run so I’m confused.

Kenny

Hey Kenny.

I’m going to start by trying the over-simplified. What happens if you restart the epyc node?

Thanks,
-gerald

No possible, every node (we have 20 epyc nodes) is running jobs.

I’m not showing any stale or zombie processes holding ports open across the cluster.

Kenny

Thanks Kenny.

I’ve done a search within Discourse. There are some articles that may help you.

Can you please take a look at the results of this search, and let us know if any of these solutions help?

https://discourse.openondemand.org/search?q=waiting%20for%20port

Thanks,
-gerald

Been reading those for days.

I actually managed to solve it. I took a look at all the output.log files and found it was trying to start up on one particular host, epyc012. That host is running a few dozen 2-cpu jobs. I added a --exclude= and jobs are running again.

Sorry for the false alarm guys. :face_with_raised_eyebrow:

K

Very cool. Do you know why it kept hitting the one host?