Archive for 九月 16th, 2009

一次错误

在千橡注册过程中,我很老实的点了一个下一步,然后服务器就抛出线程异常了。

我运气这么好?
千橡错误

redhat网站查到下面的信息,说是因为内存不够的原因。我觉得这个可以当作出现这个问题的解释,但是却解释得不够“完美”,我仍旧还在疑惑中:如果是是因为内存不够的原因,那么在每次测试之前,只要保证机器状态一样,那么TCP: time wait bucket table overflow这条信息的输出次数就应该是一样的,或者差的不是很多,因为每次有一个socket来了,就给一个数据结构,直到内存用完输出TCP: time wait bucket table overflow信息,但是我测试的结果是差异不小,这就让我感觉疑惑了,莫非每次分配的数据结构不一样大?这不可能。我每次单独测试都是重启开发板,然后不运行任何其他程序就进行测试的,这样应该可以保证每次测试机器的状态不是差的很多吧。

The “TCP: time wait bucket table overflow” message shows when the kernel is unable to allocate a data structure to put a socket in the TIME_WAIT state.

This is happening according to linux/net/ipv4/tcp_minisocks.c:

if (tcp_tw_count < sysctl_tcp_max_tw_buckets)
tw = kmem_cache_alloc(tcp_timewait_cachep, SLAB_ATOMIC);

if(tw != NULL) {
(..)
} else {
/* Sorry, if we’re out of memory, just CLOSE this
* socket up. We’ve got bigger problems than
* non-graceful socket closings.
*/
if (net_ratelimit())
printk(KERN_INFO “TCP: time wait bucket table overflow\\n”);
}

This problem is more likely to happen on systems creating a lot of TCP connections at a fast pace. RFC 793 decided that those sockets should stay in the TIME_WAIT state for 2*MSL (Maximum Segment Life), but the Linux implementation seems to make the TIME_WAIT state last for 1 minute.

Monitor the resources used by those time wait buckets by watching:

# cat /proc/slabinfo | grep tcp_tw_bucket

The size of the time wait bucket can be adjusted by writing to /proc/sys/net/ipv4/tcp_max_tw_buckets. However, the link between the client and the server cannot cause packets to arrive out of order, then the TIME_WAIT state can be skipped and sockets can be recycled immediately. Socket recycling can be configured in /proc/sys/net/ipv4/tcp_tw_recycle, but check with your network administrator to verify whether it’s safe to do so.