看板 FB_smp 關於我們 聯絡資訊
Hi guys, I been having quite some trouble finding a problem whom seem to be related with SMP on one of my production server. The problem is not easily reproducible but the best way I found was to fire up "make buildworld" while having some other things going on (mysql, apache, bind, jails, etc). When SMP is active, the compile will end up with a segfault or, quite rarely, end up with a crash. I recently configure the crash device but still was unable to recreate a full system crash. At first, I thought it was related to the memory so I done some test and changed most DIMM but ultimately, the problem was sill there. To pin point the problem, I first tried to add options to the GENERIC kernel witch I found to be stable. That's how I found that it was related to SMP. I then tried mixing some other thing like reducing the driver in the kernel to the minimum I could for different reason. One of them is that the motherboard is a "Tyan thunder K7X" (http://www.tyan.com/archive/products/html/thunderk7x.html) and it has an onbord adaptec SCSI controller which I don't use. Since the driver used for this adapter is not MP safe, I tried disabling it via the BIOS and/or by disabling the driver in the kernel but it had no effect. The actual SCSI adapter in used is the Dell 4/DC (LSILogic MegaRAID) you can see in the dmesg. Now I have no clue on how I could further debug this problem. dmesg from generic kernel: Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 27 07:56:51 EST 2008 root@megatron.mantor.org:/usr/obj/usr/src/sys/GENERIC ACPI APIC Table: <PTLTD APIC > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) MP 2200+ (1800.07-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x680 Stepping = 0 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow!+,3DNow!> real memory = 3220701184 (3071 MB) avail memory = 3150741504 (3004 MB) MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 27 2008 07:56:28) acpi0: <PTLTD RSDT> on motherboard acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <AMD 762 host to AGP bridge> port 0x1810-0x1813 mem 0xf8000000-0xfbffffff,0xf6210000-0xf6210fff at device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <AMD 768 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pci0: <bridge> at device 7.3 (no driver attached) amr0: <LSILogic MegaRAID 1.53> mem 0xf6200000-0xf620ffff irq 20 at device 8.0 on pci0 amr0: delete logical drives supported by controller amr0: <LSILogic PERC 4/DC> Firmware 350O, BIOS 1.09, 128MB RAM ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1000-0x10ff mem 0xf4000000-0xf4000fff irq 20 at device 10.0 on pci0 ahc0: [GIANT-LOCKED] aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1400-0x14ff mem 0xf4001000-0xf4001fff irq 21 at device 10.1 on pci0 ahc1: [GIANT-LOCKED] aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0 pci2: <ACPI PCI bus> on pcib2 ohci0: <OHCI (generic) USB controller> mem 0xf4100000-0xf4100fff irq 19 at device 0.0 on pci2 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered pci2: <display, VGA> at device 7.0 (no driver attached) xl0: <3Com 3c980C Fast Etherlink XL> port 0x2400-0x247f mem 0xf4102000-0xf410207f irq 18 at device 8.0 on pci2 miibus0: <MII bus> on xl0 ukphy0: <Generic IEEE 802.3u media interface> on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:e0:81:22:2e:c4 xl1: <3Com 3c980C Fast Etherlink XL> port 0x2480-0x24ff mem 0xf4102400-0xf410247f irq 19 at device 9.0 on pci2 miibus1: <MII bus> on xl1 ukphy1: <Generic IEEE 802.3u media interface> on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl1: Ethernet address: 00:e0:81:22:2e:c5 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xe0000-0xe3fff on isa0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 8250 or not responding sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1800073530 Hz quality 800 Timecounters tick every 1.000 msec hptrr: no controller detected. Waiting 5 seconds for SCSI devices to settle ad0: 476940MB <WDC WD5000AAKB-00UKA0 07.01N01> at ata0-master UDMA100 amr0: delete logical drives supported by controller amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 139900MB (286515200 sectors) RAID 1 (optimal) Trying to mount root from ufs:/dev/amrd0s1a kldstat: Id Refs Address Size Name 1 10 0xc0400000 7a05b0 kernel 2 1 0xc0ba1000 5c304 acpi.ko 3 1 0xc8093000 3000 fdescfs.ko 4 1 0xc8106000 3000 pflog.ko 5 1 0xc8109000 2d000 pf.ko 6 1 0xc817b000 19000 linux.ko If you have any idea or you need more information to diagnosis the problem please let me known. regards, --- Danny Fullerton Mantor Organization _______________________________________________ freebsd-smp@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-smp To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org"