閱讀文章 - 看板 FB_bugs - 批踢踢實業坊

發信人hoomanfazaeli@gmail.com (Hooman Fazaeli),

看板FB_bugs

標題Re: FreeBSD 9 w/ MPD5 crashes as LNS with 300+ tunnels. Netgraph

發信站NCTU CS FreeBSD Server (Mon Jun 16 03:09:20 2014)

轉信站ptt!csnews.cs.nctu!news.cednctu!FreeBSD.cs.nctu!.POSTED!freebsd.org!ow

On 6/15/2014 3:39 PM, Mark van der Meulen wrote: > Hi List, > > I=B9m wondering if anyone can help me with this problem or at least help > point me in the direction of where to start looking? I have FreeBSD 9 > based servers which are crashing every 4-10 days and producing crash dumps > similar to this one: http://pastebin.com/F82Jc08C > > All crash dumps seem to involve the net graph code and the current process > is always ng_queueX. > > In summary, we have 4 x FreeBSD server running as LNS(MPD5) for around > 2000 subscribers with 3 of the servers running a modified version of > BSDRP, the fourth running a FreeBSD 9 install with what I thought was the > latest stable source for the kernel because I fetched it from stable/9 > however it shows up as 9.3-BETA in uname(the linked crash dump is from > that server). > > 3 x LNS running modified BSDRP: DELL PowerEdge 2950, 2 x Xeon E5320, 4GB > RAM, igb Quad Port NIC in LAGG, Quagga, MPD5, IPFW for Host Access > Control, NTPD, BSNMPD > 1 x LNS running latest FreeBSD 9 code: HP ProLiant DL380, 2 x Xeon X5465, > 36GB RAM, em Quad Port NIC in LAGG, BIRD, MPD5, IPFW for Host Access > Control, NTPD, BSNMPD > > The reason I built the fresh server on FreeBSD 9 is because I cannot save > crash dumps for BSDRP easily. In short the problem is this =AD servers wi= th > 10-50 clients will run indefinitely(as long as we have had them, which is > probably about 1.5 years) without errors and serve clients fine, however > any with over 300 clients appear to only stay online for 4-10 days maximum > before crashing and rebooting. I have attached the crash file from the > latest crash on the LNS running the latest FreeBSD 9 code however unsure > what to do with it and where to look? > > When these devices crash they are often doing in excess of > 200Mbps(anywhere between 200Mbps and 450Mbps), very little load(3-4.5 on > the first 3, less than 2 on the fourth). > > Things I=B9ve done to attempt resolution: > > - Replaced bce network cards with em network cards. This produced far less > errors on the interfaces(was many before, now none) and I think caused the > machines to stay up longer between reboots as before it would happen up to > once a day. > - Replaced em network cards with igb network cards. All this did was lower > load and give us a little more time between reboots. > - Tried an implementation using FreeBSD 10(this lasted less than 4 hours > before reboots when under load) > - Replaced memory > - Increased memory on LNS4 to 36GB. > - Various kernel rebuilds > - Tweaked various kernel settings. This appears to have helped a little > and given us more time between reboots. > - Disabled IPv6 > - Disabled IPFW > - Disabled BSNMPD > - Disabled Netflow > - Versions 5.6 and 5.7 of MPD5 > > Anyone able to help me work out what the crash dump means? It only happens > on servers running MPD5 (eg. Exact same boxes, exact same code pushing > 800Mbps+ of routing and no crashes) and I can see the crash relates to net > graph, however unsure where to go from there=8A > > Thanks, > > Mark > > > Relevant Current Settings: > > net.inet.ip.fastforwarding=3D1 > net.inet.ip.fw.default_to_accept=3D1 > net.bpf.zerocopy_enable=3D1 > net.inet.raw.maxdgram=3D16384 > net.inet.raw.recvspace=3D16384 > hw.intr_storm_threshold=3D64000 > net.inet.ip.fastforwarding=3D1 > net.inet.ip.fw.default_to_accept=3D1 > net.inet.ip.intr_queue_maxlen=3D10240 > net.inet.ip.redirect=3D0 > net.inet.ip.sourceroute=3D0 > net.inet.ip.rtexpire=3D2 > net.inet.ip.rtminexpire=3D2 > net.inet.ip.rtmaxcache=3D256 > net.inet.ip.accept_sourceroute=3D0 > net.inet.ip.process_options=3D0 > net.inet.icmp.log_redirect=3D0 > net.inet.icmp.drop_redirect=3D1 > net.inet.tcp.drop_synfin=3D1 > net.inet.tcp.blackhole=3D2 > net.inet.tcp.sendbuf_max=3D16777216 > net.inet.tcp.recvbuf_max=3D16777216 > net.inet.tcp.sendbuf_auto=3D1 > net.inet.tcp.recvbuf_auto=3D1 > net.inet.udp.recvspace=3D262144 > net.inet.udp.blackhole=3D0 > net.inet.udp.maxdgram=3D57344 > net.route.netisr_maxqlen=3D4096 > net.local.stream.recvspace=3D65536 > net.local.stream.sendspace=3D65536 > net.graph.maxdata=3D65536 > net.graph.maxalloc=3D65536 > net.graph.maxdgram=3D2096000 > net.graph.recvspace=3D2096000 > kern.ipc.somaxconn=3D32768 > kern.ipc.nmbclusters=3D524288 > kern.ipc.maxsockbuf=3D26214400 > kern.ipc.shmmax=3D=B32147483648" > kern.ipc.nmbjumbop=3D=B353200" > kern.ipc.maxpipekva=3D=B3536870912" > kern.random.sys.harvest.ethernet=3D"0" > kern.random.sys.harvest.interrupt=3D"0" > vm.kmem_size=3D=B34096M=B2 # Only on box with over 12G RAM. Otherwise 2G. > > > vm.kmem_size_max=3D=B38192M" # Only on box with over 12G RAM. > hw.igb.rxd=3D"4096" > hw.igb.txd=3D"4096" > hw.em.rxd=3D"4096" > hw.em.txd=3D"4096" > hw.igb.max_interrupt_rate=3D=B332000" > > hw.igb.rx_process_limit=3D"4096" > hw.em.rx_process_limit=3D"500" > net.link.ifqmaxlen=3D"20480" > net.isr.dispatch=3D"direct" > net.isr.direct_force=3D"1" > net.isr.direct=3D"1" > net.isr.maxthreads=3D"8" > net.isr.numthreads=3D"4" > net.isr.bindthreads=3D"1" > net.isr.maxqlimit=3D"20480" > net.isr.defaultqlimit=3D"8192" > > The following workarounds have worked for some people. They may not solve your problem, but are worth giving a try: 1. Increases netgraph limits: net.graph.maxdata=3D262140 # /boot/loader.conf net.graph.maxalloc=3D262140 # /boot.loader.conf 2. Remove FLOWTABLE kernel option. It would also help if you put your kernel and core dump somewhere for downl= oad so we can have a closer look at panic trace. -- = Best regards. Hooman Fazaeli _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"