TCP hole punching 모듈이 있어서 테스트 하려고 다운받으니 jar로 되어 있다.


홈페이지에서 하라는대로 다 하고 android에서 실행하려니 Exception 발생


03-27 16:10:46.123: E/AndroidRuntime(16929): FATAL EXCEPTION: main

03-27 16:10:46.123: E/AndroidRuntime(16929): java.lang.ExceptionInInitializerError

03-27 16:10:46.123: E/AndroidRuntime(16929): at com.ahope.test_tcp_hole.MainActivity.onClick(MainActivity.java:34)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.view.View.performClick(View.java:4114)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.view.View$PerformClick.run(View.java:17097)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.os.Handler.handleCallback(Handler.java:615)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.os.Handler.dispatchMessage(Handler.java:92)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.os.Looper.loop(Looper.java:137)

03-27 16:10:46.123: E/AndroidRuntime(16929): at android.app.ActivityThread.main(ActivityThread.java:4885)

03-27 16:10:46.123: E/AndroidRuntime(16929): at java.lang.reflect.Method.invokeNative(Native Method)

03-27 16:10:46.123: E/AndroidRuntime(16929): at java.lang.reflect.Method.invoke(Method.java:511)

03-27 16:10:46.123: E/AndroidRuntime(16929): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)

03-27 16:10:46.123: E/AndroidRuntime(16929): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557)

03-27 16:10:46.123: E/AndroidRuntime(16929): at dalvik.system.NativeStart.main(Native Method)

03-27 16:10:46.123: E/AndroidRuntime(16929): Caused by: java.lang.ExceptionInInitializerError

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:73)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:242)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:254)

03-27 16:10:46.123: E/AndroidRuntime(16929): at de.htwg_konstanz.in.uce.hp.parallel.target.HolePunchingTarget.<clinit>(HolePunchingTarget.java:51)

03-27 16:10:46.123: E/AndroidRuntime(16929): ... 12 more

03-27 16:10:46.123: E/AndroidRuntime(16929): Caused by: java.lang.VerifyError: org/apache/log4j/config/PropertySetter

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:772)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)

03-27 16:10:46.123: E/AndroidRuntime(16929): at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)

03-27 16:10:46.123: E/AndroidRuntime(16929): ... 16 more


apache의 log4j를 android에서 사용할 수 없다는 것인데...


찾아보니 android에서 직접 사용하는것에 대한 예제와 해결방법은 많은데 내가 원하는것은 없었다..



java 소스를 git로 받아서 로그 출력하는 부분을 주석처리 할까 하다가, jar에 있는 class파일을 변경하면 되지 않을까란 생각이 들었다.


http://www.slf4j.org/download.html 사이트에서 수정된 jar파일을 받아 class파일을 원래 jar에 적용하니 잘 작동한다.






출처: http://androidkr.blogspot.kr/2012/03/mac-mac-os-x-lion-wget.html


Mac Ports 설치 후

  • sudo port selfupdate
  • sudo port install wget


출처: http://stackoverflow.com/questions/448944/c-non-blocking-keyboard-input


windows에서 conio.h를 포함하고 _kbhit()함수를 사용하여 키보드 이벤트가 발생했는지 여부를 알 수 있는데 리눅스에서는 해당하는 함수가 없어서 검색해봄.


#ifndef WIN32

int _kbhit()

{

struct timeval tv = { 0L, 0L };

fd_set fds;

FD_ZERO(&fds);

FD_SET(0, &fds);

return select(1, &fds, NULL, NULL, &tv);

}

#endif



위의 소스로 안되는 경우가 있어서 좀 더 찾아보니 다음과 같은 것들도 있다.

출처는 linux kbhit 오픈 소스라는데... 까먹음 ;;


static struct termios initial_settings, new_settings;

static int peek_character = -1;


void init_keyboard()

{

tcgetattr(0,&initial_settings);

new_settings = initial_settings;

new_settings.c_lflag &= ~ICANON;

new_settings.c_lflag &= ~ECHO;

new_settings.c_cc[VMIN] = 1;

new_settings.c_cc[VTIME] = 0;

tcsetattr(0, TCSANOW, &new_settings);

}


void close_keyboard()

{

tcsetattr(0, TCSANOW, &initial_settings);

}


int _kbhit()

{

unsigned char ch;

int nread;


if (peek_character != -1) return 1;

new_settings.c_cc[VMIN]=0;

tcsetattr(0, TCSANOW, &new_settings);

nread = read(0,&ch,1);

new_settings.c_cc[VMIN]=1;

tcsetattr(0, TCSANOW, &new_settings);

if(nread == 1) 

{

peek_character = ch;

return 1;

}

return 0;

}


int _getch()

{

char ch;


if(peek_character != -1) 

{

ch = peek_character;

peek_character = -1;

return ch;

}

read(0,&ch,1);

return ch;

}


int _putch(int c) {

putchar(c);

fflush(stdout);

return c;

}


사용법


init_keyboard();

while (1) {

if (_kbhit()) {

int ch = _getch();

_putch(ch);

switch (ch) {

...

}

}

}

close_keyboard();



putch를 해주는 이유는 _getch호출 시 화면에 출력하지 않고 바로 input buffer에서 읽어오기 때문에 사용자에게 보여주기 위해 출력해줌.

'develop > linux' 카테고리의 다른 글

TTFB 체크 명령어  (0) 2016.02.16
GNU C, __attribute__  (0) 2014.10.14
NAT 종류별 설정  (0) 2014.02.17
Linux TCP/IP tunning  (0) 2014.01.08
Ubuntu에서 Oracle java 설치  (0) 2013.12.31

참조:http://forums.gentoo.org/viewtopic-t-826825-start-0.html


Full cone

#iptables -t nat -A POSTROUTING -o eth0 -p udp --sport {source port} -j SNAT --to-source {NAT public ip}

#iptables -t nat -A PREROUTING -i eth0 -p udp --dport {destination port} -j DNAT --to-destination {Client ip}


Symmetric

Masquerade 설정하면 됨



아래는 원문

I think the wikipedia page makes those distinctions a little clearer: 
http://en.wikipedia.org/wiki/Network_address_translation#Types_of_NAT 
I will be working on these definitions, which I take to be equivalent to yours. 

I disagree with Pedro Gonçalves's rules. In the Full Cone NAT rules he provides, he doesn't match ports, and so it seems as though all traffic coming in on eth0 would be forwarded through to 10.0.0.1 and all traffic leaving eth0 would be SNAT sourced from 192.168.2.170, regardless of port. The specifications specifically mention a particular port. 

I also expanded on Pedro Gonçalves's naming convention by adding interface names and host names: 
Public, 192.168.2.170, $EXTIF, router.network 
Private, 10.0.0.1, $INTIF, inner.network 
Port is $P in all cases (although it wouldn't have to be). 

The way I comprehend the question, a port number $P is given and must be a part of the rules. 

I don't think these rules are perfect; I'm the least sure about the restricted cones. Nevertheless I think it will move you in the right direction. 

Full cone NAT 
this covers outgoing traffic which should be rewritten to appear to come from router.network:$P. 1 ea. for UDP, TCP 

Code:
iptables -t nat POSTROUTING -o $EXTIF -p tcp --sport $P -j SNAT --to-source 192.168.2.170 
iptables -t nat POSTROUTING -o $EXTIF  -p udp --sport $P -j SNAT --to-source 92.168.2.170


now we need the reverse direction, incoming traffic on $P is forwarded to 10.0.0.1 

Code:
iptables -t nat PREROUTING -i $EXTIF -p tcp --dport $P -j DNAT --to-destination 10.0.0.1 
iptables -t nat PREROUTING -i $EXTIF -p udp --dport $P -j DNAT --to-destination 10.0.0.1



[Address] Restricted Cone Nat 
Here we reject incoming packets that aren't already established. First we need the rules above. Then we need an INPUT rule that will match incoming connections on $EXTIF:$P 
and accept only those which are connected already. Thus the connection must be instigated by inner.network. 

Code:

# previous rules 
iptables -t nat POSTROUTING -o $EXTIF -p tcp --sport $P -j SNAT --to-source 192.168.2.170 
iptables -t nat POSTROUTING -o $EXTIF  -p udp --sport $P -j SNAT --to-source 92.168.2.170 
iptables -t nat PREROUTING -i $EXTIF -p tcp --dport $P -j DNAT --to-destination 10.0.0.1 
iptables -t nat PREROUTING -i $EXTIF -p udp --dport $P -j DNAT --to-destination 10.0.0.1 
# FILTER rules to drop, rather than forward, new connections 
# we accept already established connections (These are only necessary if default policy is not ACCEPT) 
iptables -A INPUT -i $EXTIF -p tcp --dport $P -m state --state ESTABLISHED,RELATED -j ACCEPT 
iptables -A INPUT -i $EXTIF -p udp --dport $P -m state --state ESTABLISHED,RELATED -j ACCEPT 
# now rules to drop the packets otherwise (only necessary if default policy is not DROP) 
iptables -A INPUT -i $EXTIF -p tcp --dport $P -m state --state NEW -j DROP 
iptables -A INPUT -i $EXTIF -p udp --dport $P -m state --state NEW -j DROP 



Port Restricted Cone Nat 
This is the same as the above, except we also check the source port on the INPUT chain. 

Code:

# previous rules 
iptables -t nat POSTROUTING -o $EXTIF -p tcp --sport $P -j SNAT --to-source 192.168.2.170 
iptables -t nat POSTROUTING -o $EXTIF  -p udp --sport $P -j SNAT --to-source 92.168.2.170 
iptables -t nat PREROUTING -i $EXTIF -p tcp --dport $P -j DNAT --to-destination 10.0.0.1 
iptables -t nat PREROUTING -i $EXTIF -p udp --dport $P -j DNAT --to-destination 10.0.0.1 
# FILTER rules to drop, rather than forward, new connections 
# we accept already established connections (These are only necessary if default policy is not ACCEPT) 
iptables -A INPUT -i $EXTIF -p tcp --sport $P --dport $P -m state --state ESTABLISHED,RELATED -j ACCEPT 
iptables -A INPUT -i $EXTIF -p udp --sport $P --dport $P -m state --state ESTABLISHED,RELATED -j ACCEPT 
# now rules to drop the packets otherwise (only necessary if default policy is not DROP) 
iptables -A INPUT -i $EXTIF -p tcp --dport $P -m state --state NEW -j DROP 
iptables -A INPUT -i $EXTIF -p udp --dport $P -m state --state NEW -j DROP 



Symmetric NAT 
It seems that this could be called 'Full Nat' or 'Masquerading'. New connections are never forwarded through router.network to inner.network, but new connections are dynamically mapped to ports on $EXTIF. This is pretty complicated, but the iptables rule is very easy. 

Code:

# no other rules are required for this.  
iptables -t nat -I POSTROUTING -s 10.0.0.1 -o $EXTIF  -j MASQUERADE 


'develop > linux' 카테고리의 다른 글

GNU C, __attribute__  (0) 2014.10.14
Linux C에서 키보드 이벤트 받기(kbhit())  (0) 2014.02.28
Linux TCP/IP tunning  (0) 2014.01.08
Ubuntu에서 Oracle java 설치  (0) 2013.12.31
GCC에서 컴파일 시 문구 출력하기  (0) 2013.04.18

참조: http://en.wikipedia.org/wiki/Network_address_translation#Methods_of_port_translation


크게 2종류

- Symmetric NAT

- Cone NAT


- Full cone

- 매핑된 NAT 포트로 외부의 어느 호스트에서든 패킷을 송/수신할 수 있다.

- Address Restricted cone

- A 호스트와 NAT간 패킷을 송/수신하기 위해 포트가 매핑 된 경우, 해당 포트로 다른 호스트와의 패킷 송/수신이 불가능

- A 호스트와 패킷 송/수신 시 매핑 된 포트가 아니여도 패킷 송/수신 가능

- Port Restricted cone

- A 호스트와 NAT간의 포트가 매핑 된 경우, A 호스트는 다른 포트를 통하여 패킷을 송/수신 할 수 없다. (Symmetric NAT와 비슷)

- Symmetric NAT

- A 호스트와 NAT간의 패킷 송/수신시 외부 호스트 별로 NAT의 포트가 1:1 매핑된다.

- 매핑 된 포트로는 매핑 될 당시의 외부 호스트와만 패킷 송/수신이 가능하다.



설정방법은 여기 참조



추가 1. 관련 RFC

RFC 5780과 RFC 3489간에 관계
[RFC5780] EIM + EIF = [RFC3489] Full Cone
[RFC5780] EIM + ADF = [RFC3489] Restricted Cone
[RFC5780] EIM + APDF = [RFC3489] Port Restricted Cone
[RFC5780] ADM + EIF
[RFC5780] ADM + ADF
[RFC5780] ADM + APDF
[RFC5780] APDM + EIF
[RFC5780] APDM + ADF
[RFC5780] APDM + APDF = [RFC3489] Symmetric


 understand Life Cycle (iPhone / Android)

enter image description here


'develop > 공통' 카테고리의 다른 글

sqlite  (0) 2014.12.18
NAT의 종류  (0) 2014.02.17
소켓 Close시 TIME_WAIT 문제(C언어)  (0) 2013.12.30
DualStack Mobile IPv6 with Multiple Care of Address  (0) 2013.06.24
IPv4, IPv6 체크 정규식  (0) 2013.04.02

암호화 모듈을 pure Java모듈을 사용하다가 native를 사용하면 더 빠를 것이라는 의견이 있어 JNI with C++로 구현하였다.


테스트 중 native에서 자꾸만 "stack corruption detected: aborted" 에러가 발생하며 앱이 죽는 문제가 발생했다.


죽는 상황은 encryption/decryption시 원본 데이터의 크기가 native에서 선언 된 임시 버퍼인 unsigned char 배열의 크기보다 큰 경우에만 발생하였다.


어제/오늘 총 6시간 가까이 소스를 아무리 봐도... 로그를 아무리 확인해봐도... 문법적으로, 논리적으로 틀린곳이 없었다..


에러 내용으로 유추해보면 임시 버퍼를 잘 못 건들거나, 원본 데이터나 결과 데이터를 잘 못 조작하는 경우이고, 원본/결과 데이터를 건드는 곳은 전혀 문제가 없음을 확인했다.


임시 버퍼의 마지막 원소 다음 위치의 값을 확인한 결과 EVP_CipherUpdate실행 후 변경되는 것을 확인하였다. 하지만 EVP_CipherUpdate이 결과값을 리턴하는 것은 정확히 임시버퍼의 크기만큼여서 의심하지 않고 넘어갔었는데.. 결국 EVP_CipherUpdate에서 결과 값 외에도 뭔가 건드는 것이 있는 것으로 판단되어 임시버퍼의 크기를 암호화 알고리즘의 블럭 사이즈만큼 크게 잡아주니 정상 처리 된다.


=> update함수에 대한 설명을 좀 찾아보니 out length + BLOCK_SIZE - 1 만큼의 크기를 out buffer로 잡아주어야 된다고 되어 있다 ㅡㅡ;



예를들어 자바 클래스 A가 있고 아래와 같은 정의가 있을 때
A.java
class A {
int aa;
byte [] bb;
static {loadlibrary(...);}
native void func();
}

native.cpp
JNIEXPORT void JNICALL func(JNIEnv *e, jobject self)
{
jclass cls = e->GetObjectClass(self);
jfieldID aaId = e->GetFieldID(cls, "aa", "I");
jfieldID bbId = e->GetFieldID(cls, "bb", "[B");

jint aa = e->GetObjectField(self, aaId);
jbytearray bb = e->GetObjectField(self, bbId);
....
}

만약 String 변수를 사용하는경우

jstring str = e->GetObjectField(self, fid); const char * pcName = _env->GetStringUTFChars(jstr, NULL); strcpy(user.caName, pcName); e->ReleaseStringUTFChars(jstr, pcName); // ReleaseStringUTFChars 반드시 해준다.



밑의 표는 GetFieldID 의 3번째 인자에 넣을 값이다.

Type Signature

Java Type

Z

boolean

B

byte

C

char

S

short

I

int

J

long

F

float

D

double

L fully-qualified-class ;

fully-qualified-class

[ type

type[]

( arg-types ) ret-type

method type




A라는 Activity에서 화면 회전등의 변경이 일어날 때 메모리에 저장된 데이터를 유지하기 위하여 Manifest에서 해당 Activity에 다음과 같은 속성을 설정했다.


android:configChanges="orientation"


기본적으로 이 설정은 잘 동작하지만, 보통 추천하는 설정은 다음과 같다.


android:configChanges="orientation|keyboardHidden"


GB 소스를 참고해서 만들고 있었기 때문에 몰랐었던 내용이었는데, 위 설정에도 불구하고 JB MR2에서 onConfigurationChanged() 가 호출되지 않고 onSavedIns...() -> onDestroy() -> onCreate() 과정의 메서드가 호출이 되었다.


안드로이드 개발 사이트를 뒤적여보니 다음과 같은 내용이 있었다. (출처 : http://developer.android.com/guide/topics/resources/runtime-changes.html)


Caution: Beginning with Android 3.2 (API level 13), the "screen size" also changes when the device switches between portrait and landscape orientation. Thus, if you want to prevent runtime restarts due to orientation change when developing for API level 13 or higher (as declared by the minSdkVersion and targetSdkVersionattributes), you must include the "screenSize" value in addition to the "orientation" value. That is, you must decalare android:configChanges="orientation|screenSize". However, if your application targets API level 12 or lower, then your activity always handles this configuration change itself (this configuration change does not restart your activity, even when running on an Android 3.2 or higher device).


최종적으로 다음과 같이 설정하니 잘 동작한다.


android:configChanges="orientation|keyboardHidden|screenSize"


출처: http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/



We’re a performance company, and performance and scalability go hand in hand. Better scalability results in more consistent performance and at LogNormal, we like pushing our hardware as far as it will go.

Today’s post is about some of the infrastructure we use and how we tune it to handle a large number of requests.

We have separate components of our software stack to handle different tasks. In this post I’ll only cover the parts that make up our beacon collection component and how we tune it. Only a few of the tuning points are specific to this component.

The Stack

(side note… someone needs to start a Coffee Shop + Co-working Space called The Stack).

  • The beacon collector runs Linux at its base. We use a combination of Ubuntu 11.10 and 12.04, which for most purposes are the same. If you’re going with a new implementation though, I’d suggest 12.04 (or at least the 3.x kernels).

  • Slightly higher up is iptables to restrict inbound connections. This is mainly because we’re hosted on shared infrastructure and need to restrict internal communications only to hosts that we trust. iptables is the cheapest way to do this, but it brings in a few caveats that we address in the tuning section later.

  • We then have nginx set up to serve HTTP traffic on ports 80 and 443 and do some amount of filtering (more on this later)

  • Behind nginx is our custom node.js server that handles and processes beacons as they come in. It reads some configuration data from couchdb and then sends these processed beacons out into the ether. Nginx and node talk to each other over a unix domain socket.

That’s about all that’s relevant for this discussion, but at the heart of it, you’ll see that there are lots of file handles and sockets in use at any point of time.

A large part of this is due to the fact that nginx only uses HTTP/1.0 when it proxies requests to a back end server, and that means it opens a new connection on every request rather than using a persistent connection.

What should we tune?

In this post I’ll talk only about the first two parts of our stack. Linux and iptables.

Open files

Since we deal with a lot of file handles (each TCP socket requires a file handle), we need to keep our open file limit high. The current value can be seen using ulimit -a (look for open files). We set this value to 999999 and hope that we never need a million or more files open. In practice we never do.

We set this limit by putting a file into /etc/security/limits.d/ that contains the following two lines:

*	soft	nofile	999999
*	hard	nofile	999999

(side node: it took me 10 minutes trying to convince Markdown that those asterisks were to be printed as asterisks)

If you don’t do this, you’ll run out of open file handles and could see one or more parts of your stack die.

Ephemeral Ports

The second thing to do is to increase the number of Ephemeral Ports available to your application. By default this is all ports from 32768 to 61000. We change this to all ports from 18000 to 65535. Ports below 18000 are reserved for current and future use of the application itself. This may change in the future, but is sufficient for what we need right now, largely because of what we do next.

TIME_WAIT state

TCP connections go through various states during their lifetime. There’s the handshake that goes through multiple states, then the ESTABLISHED state, and then a whole bunch of states for either end to terminate the connection, and finally a TIME_WAIT state that lasts a really long time. If you’re interested in all the states, read through the netstat man page, but right now the only one we care about is the TIME_WAIT state, and we care about it mainly because it’s so long.

By default, a connection is supposed to stay in the TIME_WAIT state for twice the msl. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem (the full details of this are beyond the scope of this article, but ask me if you’d like details). The default msl is 60 seconds, which puts the default TIME_WAIT timeout value at 2 minutes. Which means you’ll run out of available ports if you receive more than about 400 requests a second, or if we look back to how nginx does proxies, this actually translates to 200 requests per second. Not good for scaling.

We fixed this by setting the timeout value to 1 second.

I’ll let that sink in a bit. Essentially we reduced the timeout value by 99.16%. This is a huge reduction, and not to be taken lightly. Any documentation you read will recommend against it, but here’s why we did it.

Again, remember the point of the TIME_WAIT state is to avoid confusing the transport layer. The transport layer will get confused if it receives an out of order packet on a currently established socket, and send a reset packet in response. The key here is the term established socket. A socket is a tuple of 4 terms. The source and destination IPs and ports. Now for our purposes, our server IP is constant, so that leaves 3 variables.

Our port numbers are recycled, and we have 47535 of them. That leaves the other end of the connection.

In order for a collision to take place, we’d have to get a new connection from an existing client, AND that client would have to use the same port number that it used for the earlier connection, AND our server would have to assign the same port number to this connection as it did before. Given that we use persistent HTTP connections between clients and nginx, the probability of this happening is so low that we can ignore it. 1 second is a long enough TIME_WAITtimeout.

The two TCP tuning parameters were set using sysctl by putting a file into /etc/sysctl.d/ with the following:

net.ipv4.ip_local_port_range = 18000    65535
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 1

Connection Tracking

The next parameter we looked at was Connection Tracking. This is a side effect of using iptables. Since iptablesneeds to allow two-way communication between established HTTP and ssh connections, it needs to keep track of which connections are established, and it puts these into a connection tracking table. This table grows. And grows. And grows.

You can see the current size of this table using sysctl net.netfilter.nf_conntrack_count and its limit usingsysctl net.nf_conntrack_max. If count crosses max, your linux system will stop accepting new TCP connections and you’ll never know about this. The only indication that this has happened is a single line hidden somewhere in/var/log/syslog saying that you’re out of connection tracking entries. One line, once, when it first happens.

A better indication is if count is always very close to max. You might think, “Hey, we’ve set max exactly right.”, but you’d be wrong.

What you need to do (or at least that’s what you first think) is to increase max.

Keep in mind though, that the larger this value, the more RAM the kernel will use to keep track of these entries. RAM that could be used by your application.

We started down this path, increasing net.nf_conntrack_max, but soon we were just pushing it up every day. Connections that were getting in there were never getting out.

nf_conntrack_tcp_timeout_established

It turns out that there’s another timeout value you need to be concerned with. The established connection timeout. Technically this should only apply to connections that are in the ESTABLISHED state, and a connection should get out of this state when a FIN packet goes through in either direction. This doesn’t appear to happen and I’m not entirely sure why.

So how long do connections stay in this table then? The default value for nf_conntrack_tcp_timeout_established is 432000 seconds. I’ll wait for you to do the long division…

Fun times.

I changed the timeout value to 10 minutes (600 seconds) and in a few days time I noticed conntrack_count go down steadily until it sat at a very manageable level of a few thousand.

We did this by adding another line to the sysctl file:

net.netfilter.nf_conntrack_tcp_timeout_established=600

Speed bump

At this point we were in a pretty good state. Our beacon collectors ran for months (not counting scheduled reboots) without a problem, until a couple of days ago, when one of them just stopped responding to any kind of network requests. No ping responses, no ACK packets to a SYN, nothing. All established ssh and HTTP connections terminated and the box was doing nothing. I still had console access, and couldn’t tell what was wrong. The system was using less than 1% CPU and less than 10% of RAM. All processes that were supposed to be running were running, but nothing was coming in or going out.

I looked through syslog, and found one obscure message repeated several times.

IPv4: dst cache overflow

Well, there were other messages, but this was the one that mattered.

I did a bit of searching online, and found something about an rt_cache leak in 2.6.18. We’re on 3.5.2, so it shouldn’t have been a problem, but I investigated anyway.

The details of the post above related to 2.6, and 3.5 was different, with no ip_dst_cache entry in /proc/slabinfo so I started searching for its equivalent on 3.5 when I came across Vincent Bernat's post on the IPv4 route cache. This is an excellent resource to understand the route cache on linux, and that’s where I found out about the lnstatcommand. This is something that needs to be added to any monitoring and stats gathering scripts that you run.Further reading suggests that the dst cache gc routines are complicated, and a bug anywhere could result in a leak, one which could take several weeks to become apparent.

From what I can tell, there doesn’t appear to be an rt_cache leak. The number of cache entries increases and decreases with traffic, but I’ll keep monitoring it to see if that changes over time.

Other things to tune

There are a few other things you might want to tune, but they’re becoming less of an issue as base system configs evolve.

TCP Window Sizes

This is related to TCP Slow Start, and I’d love to go into the details, but our friends Sajal and Aaron over at CDN Planet have already done an awesome job explaining how to tune TCP initcwnd for optimum performance.

This is not an issue for us because the 3.5 kernel’s default window size is already set to 10.

Window size after idle

Related to the above is the sysctl setting net.ipv4.tcp_slow_start_after_idle. This tells the system whether it should start at the default window size only for new TCP connections or also for existing TCP connections that have been idle for too long (on 3.5, too long is 1 second, but see net.sctp.rto_initial for its current value on your system). If you’re using persistent HTTP connections, you’re likely to end up in this state, so setnet.ipv4.tcp_slow_start_after_idle=0 (just put it into the sysctl config file mentioned above).

Endgame

After changing all these settings, a single quad core vm (though using only one core) with 1Gig of RAM has been able to handle all the load that’s been thrown at it. We never run out of open file handles, never run out of ports, never run out of connection tracking entries and never run out of RAM.

We have several weeks before another one of our beacon collectors runs into the dst cache issue, and I’ll be ready with the numbers when that happens.

Thanks for reading, and let us know how these settings work out for you if you try them out. If you’d like to measure the real user impact of your changes, have a look at our Real User Measurement tool at LogNormal.

Update 2012-09-28: There are some great comments on hacker news with much more information.

+ Recent posts