Change nc -w 2 to nc -z?

I’m relatively new to OOD, and during a jupyter app install, we’ve discovered that the app seems to be waiting on a ‘nc -w 2 {host} {port}’ call. We’re on Centos 7. The call works on an older system running Centos 6, which has netcat installed. The implementation of nc on the Centos 7 system seems to want ‘nc -z -w 2 {host} {port}’ to exit from the call with a zero return value in the case of a open connection.

Assuming that my analysis of the problem is correct, what is the best or most appropriate way to make that change?

Thanks!

Hi! and welcome to OOD.

I think there may be something else going on here. We actually execute nc -w 2 "${host}" "${port}" < /dev/null &> /dev/null so technically there is i/o. I actually had to play around with this a bit and look at the code to be 100% sure.

We run RHEL 7 with this version of netcat. I also confirmed it was the same version on centos:7. I’m guessing it’s what you see too.
Ncat: Version 7.50 ( https://nmap.org/ncat )

Where you able to confirm the command runs manually? Like this test: get 2 sessions on the same compute node, one where you run nc -lk localhost 33047 (just some random port I picked, you can choose a different one) and in the other session running nc -w 2 localhost 33047 < /dev/null and verifying what’s happening?

nc -w 2 {host} {port}

works as expected by OnDemand here. This is “/usr/bin/nc” from “nmap-ncat-6.40-19.el7.x86_64” on CentOS 7.7.1908

FWIW,

Ric

Thanks, I missed the redirection when I tested on the command line. I can confirm that it works as expected manually, but within the application it times out. I’m getting the “Timed out waiting for Jupyter Notebook server to open port…” message in the output log.

I can confirm we’re on the same version of Ncat:
Ncat: Version 7.50 ( https://nmap.org/ncat )

Apologies for the misdirection - what threw me was that the Centos6 nc implementation (which is netcat, I believe) worked without the redirection.

Any suggestions on how to track this down further?

Thanks!

No need to apologize at all! Can you post the relevant logs here? I’m specifically interested in the timing of the log messages. If it’s a clean 5 minutes between the boot up and the failure, that indicates netcat is immediately failing which it shouldn’t be (600 tries * 0.5 seconds sleeping = 300 seconds).

Here’s the file you need to hack to figure out more. Please back it up before modifications and move the original back when you’re done. I’d suggest by starting to remove the output redirection at the end of the nc command in the port_used function. That way you can see what’s being output there. I’d say that’ll be the best first step.
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/batch_connect/template.rb

@azric thanks for jumping in!

It takes about 1 minute before it fails. I’ll post the log below.

I tried editing the template file earlier, but the changes don’t seem to
have any effect. Do I need to restart the service or anything?

Output log:

Script starting…
Waiting for Jupyter Notebook server to open port 26002…
TIMING - Starting wait at: Fri Nov 22 16:54:00 EST 2019
TIMING - Starting main script at: Fri Nov 22 16:54:00 EST 2019

Currently Loaded Modules:

  1. python/Anaconda3.7

TIMING - Starting jupyter at: Fri Nov 22 16:54:01 EST 2019

  • jupyter notebook
    –config=/home/jeff.dusenberry/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/4c3e2bd6-f930-4d1d-a2fe-a659b1c2e5d5/config.py
    [I 16:54:05.841 NotebookApp] JupyterLab extension loaded from
    /share/apps/Anaconda3.7/lib/python3.7/site-packages/jupyterlab
    [I 16:54:05.841 NotebookApp] JupyterLab application directory is
    /share/apps/Anaconda3.7/share/jupyter/lab
    [I 16:54:05.846 NotebookApp] Serving notebooks from local directory:
    /home/jeff.dusenberry
    [I 16:54:05.846 NotebookApp] The Jupyter Notebook is running at:
    [I 16:54:05.846 NotebookApp] http://(chimera02.chimera.local or
    127.0.0.1):26002/node/chimera02.chimera.local/26002/
    [I 16:54:05.846 NotebookApp] Use Control-C to stop this server and shut
    down all kernels (twice to skip confirmation).
    Timed out waiting for Jupyter Notebook server to open port 26002 on host
    chimera02.chimera.local!
    TIMING - Wait ended at: Fri Nov 22 16:55:06 EST 2019
    [C 16:55:06.433 NotebookApp] received signal 15, stopping
    [I 16:55:06.471 NotebookApp] Shutting down 0 kernels
    Cleaning up…
    /home/jeff.dusenberry/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/4c3e2bd6-f930-4d1d-a2fe-a659b1c2e5d5/script.sh:
    line 27: 28255 Terminated jupyter notebook
    –config="${CONFIG_FILE}"

Thanks,
Jeff

Yea you’ll need to bounce your PUN (per user Nginx) at least. If your site has some sort of reaper on a cron tab it may have died over the weekend. Otherwise if you have developer enabled (and you are a developer) you can bounce it in the web interface (at the top right, in the ‘develop’ dropdown and restart web server - As an aside, you can do this all the time because it only bounces your PUN so it doesn’t affect other folks).

Thanks!

I think we figured out the original problem. The compute node did not have nc installed, so it silently failed to ever show the port in use. I had done all my manual testing from the ood server, which worked.

Thanks for all your help with this.

Jeff

OK awesome, we’ve seen that before actually, I had just assumed you tested on the compute node (or at least didn’t want to question it).

Good luck with all the rest!