hereâs my best guess as to whatâs happening: The first nginx
starts and crashes, but doesnât release the socket file or zombies.
(1) I would try to find out what the process is still using it. lsof
will give you this. See if that process is still running or whatâs going on with that.
[jeff@518bd97864ee /]$ lsof /var/run/ondemand-nginx/jeff/passenger.sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 379 jeff 6u unix 0x00000000ff78144c 0t0 2523864 /var/run/ondemand-nginx/jeff/passenger.sock type=STREAM
(2) I would also check if there are coredumps in /var/lib/systemd/coredump
or if there are any messages in journalctl
related to crashes.
(3) I would do a spot check on ulimits. You could be hitting some limit. Specifically max # of files or processes.
[johrstrom ~()] đ° ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63375
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
(4) After all that, we could employ strace to dig into what itâs doing (and failing to do)
Hereâs a simple nginx wrapper that you could reconfigure the nginx_bin
configuration of your /etc/ood/config/nginx_stage.yml
file.
(you could parse out a user name from the -c /var/lib/ondemand-nginx/config/puns/<USER>.conf
option if you like to specify a new filename. I put the $(date +%s)
here to get unqiue files, given your going to get 1 that weâre intersted in - the very first - and the others will throw the error in the message youâve given there about the address being already in use).
#!/bin/bash
/bin/strace -o /tmp/nginx_strace_$(date +%s).out /opt/ood/ondemand/root/usr/sbin/nginx $@