Load Balanced, Apache, Ultramonkey. [Timewait Problema]

Iniciado por eduardo.r.t, 04 de Fevereiro de 2009, 15:13

tópico anterior - próximo tópico

eduardo.r.t

Ola pessoal, estou mexendo muito com alta disponibilidade agora.

E estou enfrentando um problema que acredito que vocês, companheiros de forum poderam me ajudar.

Segui a documentação abaixo para implementar a solução:

http://www.howtoforge.com/set-up-a-loadbalanced-ha-apache-cluster-ubuntu8.04

O problema é o seguinte:

O ultramonkey fica verificando de dois em dois segundos se a pagina ldirector.html está disponivel. Caso fique indisponivel ele marca que o serviço http da máquina está com problemas, e redireciona o request apenas para que está ativa.

Não sei por que cargas d´agua um determinado momento o apache começa a empilhar essas conexoes e não fecha a conexão, deixando muitas mais muitas mesmo conexões entre o ultramonkey - apache.

Depois que comeca acontecer isso, deixa de funcionar o top, nao consigo mais conectar no servidor como se o ultramonkey tivesse fazendo um ataque de DOS no meu apache.

Ja fiz os testes, de deixar apenas uma maquina do apache, deixar apenas uma maquina do ultramonkey nao adianta o problema sempre acontece.

Vou postar os logs do ultramonkey , do heartbeat que existe entre os ultramonkeys.

The error start in 8:51 and 9:02
I already had more than 5,000 one thousand connections in port 80.

[APACHE] /var/log/ha-log

heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 4396940 ms (> 1010 ms) before being called (GSource: 0x811b0e8)
heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306788571
heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree cou nt was delayed 4397590 ms (> 10000 ms) before being called (GSource: 0x811b1b8)
heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306788506
heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for client audit was d elayed 4391760 ms (> 5000 ms) before being called (GSource: 0x811b018)
heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306789089
heartbeat[4646]: 2009/01/31_08:51:12 info: Daily informational memory statistics
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 250/94109 ms age 0 [pid4646/MST_CONTROL]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 7885/3053206 580260/233792 [pid4646/MST_CONTRO L]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 596708 total malloc bytes. pid [4646/MST_CONTR OL]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/3 ms age 172679140 [pid4677/HBFIFO]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 315/412 30556/13710 [pid4677/HBFIFO]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 32660 total malloc bytes. pid [4677/HBFIFO]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/0 ms age 172559120 [pid4678/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 334/97649 33048/15406 [pid4678/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 41584 total malloc bytes. pid [4678/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/0 ms age 172559120 [pid4679/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 334/386 24920/11353 [pid4679/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 25004 total malloc bytes. pid [4679/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/176837 ms age 1900 [pid4680/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 346/4527045 34568/16438 [pid4680/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 46624 total malloc bytes. pid [4680/HBWRITE]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/1466 ms age 169877960 [pid4681/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 347/29731 34652/16482 [pid4681/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 36068 total malloc bytes. pid [4681/HBREAD]
heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0
heartbeat[4646]: 2009/01/31_08:51:12 info: These are nothing to worry about.
heartbeat[4657]: 2009/02/02_11:12:28 WARN: Core dumps could be lost if multiple dumps occur.
heartbeat[4657]: 2009/02/02_11:12:28 WARN: Consider setting non-default value in /proc/sys/kernel/core_patt

[UltraMonkey] /var/log/ultramonkey.log
[Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Thu Jan 29 08:53:00 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1)
[Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1)
[Thu Jan 29 09:41:52 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:80 (10.0.0.143:80) (Weight set to 0)
[Thu Jan 29 09:42:02 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:443 (10.0.0.143:443) (Weight set to 0)
[Thu Jan 29 09:42:12 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:10002 (10.0.0.143:10002) (Weight set to 0)
[Sat Jan 31 09:05:08 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0)
[Sat Jan 31 09:05:08 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0)
[Sat Jan 31 09:05:21 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0)
[Sat Jan 31 09:05:21 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0)
[Sat Jan 31 09:05:34 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0)
[Sat Jan 31 09:05:34 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0)
[Sat Jan 31 09:12:17 2009|ldirectord|4995] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1)
[Sat Jan 31 09:12:17 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1)
[Sat Jan 31 09:12:20 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1)
[Sat Jan 31 09:12:20 2009|ldirectord|4995] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1)
[Sat Jan 31 09:12:25 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Sat Jan 31 09:12:25 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Mon Feb 2 11:17:33 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0)
[Mon Feb 2 11:17:33 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0)
[Mon Feb 2 11:17:47 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0)
[Mon Feb 2 11:17:47 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0)
[Mon Feb 2 11:17:50 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0)
[Mon Feb 2 11:17:50 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0)
[Mon Feb 2 11:18:07 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Mon Feb 2 11:18:07 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1)
[Mon Feb 2 11:19:44 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1)
[Mon Feb 2 11:19:44 2009|ldirectord|4995] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1)
[Mon Feb 2 11:19:52 2009|ldirectord|4995] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1)
[Mon Feb 2 11:19:52 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1)




[Ultramonkey]
root@vsrv123:/etc/ha.d# cat ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ultramonkey.log"
quiescent=yes

virtual = 10.0.0.143:80
real = 10.0.0.122:80 gate
real = 10.0.0.132:80 gate
service = http
request = "ldirector.html"
receive = "Test Page"
scheduler = rr
protocol = tcp
checktype = negotiate


virtual = 10.0.0.143:443
real = 10.0.0.122:443 gate
real = 10.0.0.132:443 gate
service = https
request = "ldirector.html"
receive = "Test Page"
scheduler = rr
protocol = tcp
checktype = negotiate

virtual = 10.0.0.143:10002
real = 10.0.0.122:10002 gate
real = 10.0.0.132:10002 gate
service = https
request = "ldirector.html"
receive = "Test Page"
scheduler = rr
protocol = tcp
checktype = negotiate