看板 FB_smp 關於我們 聯絡資訊
Hi Danny, i made some tests with FreeBSD 7.0 Prerelease in december and the problem is no longer present. The crash and seg fault you see seems to be related to ACPI/SMP implementation of freebsd6. The problem is also present and more evident with VMWare virtual hardware. No problems if you are using Intel hardware. Hope this helps. Daniel -----Original message----- From: Danny Fullerton northox@mantor.org Date: Wed, 05 Mar 2008 04:32:03 +0100 To: freebsd-smp@freebsd.org Subject: Re: Dual AMD MP unstable under heavy load when smp is active > Hello Paul, > > I would like to known if done those test with the recent FreeBSD 7.0? I > seen lots of work in the SMP area of this release and I'm wondering if I > could have better chance with this version. > > thanks, > > dmesg with smp on (GENERIC + option smp): > > Copyright (c) 1992-2008 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 27 21:11:40 EST 2008 > root@megatron.mantor.org:/usr/obj/usr/src/sys/MEGATRONTEST > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Athlon(tm) MP 2200+ (1800.07-MHz 686-class CPU) > Origin = "AuthenticAMD" Id = 0x680 Stepping = 0 > > Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> > AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow!+,3DNow!> > real memory = 3220701184 (3071 MB) > avail memory = 3146387456 (3000 MB) > ACPI APIC Table: <PTLTD APIC > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 1 > cpu1 (AP): APIC ID: 0 > MADT: Forcing active-low polarity and level trigger for SCI > ioapic0 <Version 1.1> irqs 0-23 on motherboard > kbd1 at kbdmux0 > ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) > hptrr: HPT RocketRAID controller driver v1.1 (Feb 27 2008 21:11:16) > acpi0: <PTLTD RSDT> on motherboard > acpi0: Power Button (fixed) > acpi0: Sleep Button (fixed) > Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 > cpu0: <ACPI CPU> on acpi0 > cpu1: <ACPI CPU> on acpi0 > acpi_button0: <Power Button> on acpi0 > pcib0: <ACPI Host-PCI bridge> port > 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > agp0: <AMD 762 host to AGP bridge> port 0x1810-0x1813 mem > 0xf8000000-0xfbffffff,0xf6210000-0xf6210fff at device 0.0 on pci0 > pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > isab0: <PCI-ISA bridge> at device 7.0 on pci0 > isa0: <ISA bus> on isab0 > atapci0: <AMD 768 UDMA100 controller> port > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0 > ata0: <ATA channel 0> on atapci0 > ata1: <ATA channel 1> on atapci0 > pci0: <bridge> at device 7.3 (no driver attached) > amr0: <LSILogic MegaRAID 1.53> mem 0xf6200000-0xf620ffff irq 20 at > device 8.0 on pci0 > amr0: delete logical drives supported by controller > amr0: <LSILogic PERC 4/DC> Firmware 350O, BIOS 1.09, 128MB RAM > ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1000-0x10ff mem > 0xf4000000-0xf4000fff irq 20 at device 10.0 on pci0 > ahc0: [GIANT-LOCKED] > aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs > ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1400-0x14ff mem > 0xf4001000-0xf4001fff irq 21 at device 10.1 on pci0 > ahc1: [GIANT-LOCKED] > aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs > pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0 > pci2: <ACPI PCI bus> on pcib2 > ohci0: <OHCI (generic) USB controller> mem 0xf4100000-0xf4100fff irq 19 > at device 0.0 on pci2 > ohci0: [GIANT-LOCKED] > usb0: OHCI version 1.0, legacy support > usb0: SMM does not respond, resetting > usb0: <OHCI (generic) USB controller> on ohci0 > usb0: USB revision 1.0 > uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 4 ports with 4 removable, self powered > pci2: <display, VGA> at device 7.0 (no driver attached) > xl0: <3Com 3c980C Fast Etherlink XL> port 0x2400-0x247f mem > 0xf4102000-0xf410207f irq 18 at device 8.0 on pci2 > miibus0: <MII bus> on xl0 > ukphy0: <Generic IEEE 802.3u media interface> on miibus0 > ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > xl0: Ethernet address: 00:e0:81:22:2e:c4 > xl1: <3Com 3c980C Fast Etherlink XL> port 0x2480-0x24ff mem > 0xf4102400-0xf410247f irq 19 at device 9.0 on pci2 > miibus1: <MII bus> on xl1 > ukphy1: <Generic IEEE 802.3u media interface> on miibus1 > ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > xl1: Ethernet address: 00:e0:81:22:2e:c5 > atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > pmtimer0 on isa0 > orm0: <ISA Option ROMs> at iomem > 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xe0000-0xe3fff on isa0 > ppc0: parallel port not found. > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > sio0: type 8250 or not responding > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 1.000 msec > hptrr: no controller detected. > Waiting 5 seconds for SCSI devices to settle > ad0: 476940MB <WDC WD5000AAKB-00UKA0 07.01N01> at ata0-master UDMA100 > amr0: delete logical drives supported by controller > amrd0: <LSILogic MegaRAID logical drive> on amr0 > amrd0: 139900MB (286515200 sectors) RAID 1 (optimal) > SMP: AP CPU #1 Launched! > Trying to mount root from ufs:/dev/amrd0s1a > > --- > Danny Fullerton > Mantor Organization > > Paul Missman wrote: > > > > Danny, > > > > I don't know what the bug is, but it does exist. > > > > I have an IBM x3455 with 2 Opteron dual core processors. Under heavy > > loads it crashes. As a step in debugging, I unplugged one of the > > processors, and the problem went away. I switched to Centos version > > 4, and it operates perfectly. > > > > In addition to FreeBSD, the problem also exists in Fedora Core. > > > > Of the OSes I tested, only Redhat and Centos worked correctly on the > > x3455. > > > > I didn't try Windows, so I can't say whether or not it operates > > properly on this system. > > > > Unfortunately, that is all I know about the issue. > > > > Paul Missman > > > > > > ----- Original Message ----- From: "Danny Fullerton" <northox@mantor.org> > > To: <freebsd-smp@freebsd.org> > > Sent: Tuesday, March 04, 2008 9:05 PM > > Subject: Dual AMD MP unstable under heavy load when smp is active > > > > > >> Hi guys, > >> > >> I been having quite some trouble finding a problem whom seem to be > >> related with SMP on one of my production server. > >> > >> The problem is not easily reproducible but the best way I found was to > >> fire up "make buildworld" while having some other things going on > >> (mysql, apache, bind, jails, etc). When SMP is active, the compile will > >> end up with a segfault or, quite rarely, end up with a crash. I recently > >> configure the crash device but still was unable to recreate a full > >> system crash. > >> > >> At first, I thought it was related to the memory so I done some test and > >> changed most DIMM but ultimately, the problem was sill there. To pin > >> point the problem, I first tried to add options to the GENERIC kernel > >> witch I found to be stable. That's how I found that it was related to > >> SMP. I then tried mixing some other thing like reducing the driver in > >> the kernel to the minimum I could for different reason. One of them is > >> that the motherboard is a "Tyan thunder K7X" > >> (http://www.tyan.com/archive/products/html/thunderk7x.html) and it has > >> an onbord adaptec SCSI controller which I don't use. Since the driver > >> used for this adapter is not MP safe, I tried disabling it via the BIOS > >> and/or by disabling the driver in the kernel but it had no effect. The > >> actual SCSI adapter in used is the Dell 4/DC (LSILogic MegaRAID) you can > >> see in the dmesg. > >> > >> Now I have no clue on how I could further debug this problem. > >> > >> dmesg from generic kernel: > >> > >> Copyright (c) 1992-2008 The FreeBSD Project. > >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > >> The Regents of the University of California. All rights reserved. > >> FreeBSD is a registered trademark of The FreeBSD Foundation. > >> FreeBSD 6.3-RELEASE-p1 #0: Wed Feb 27 07:56:51 EST 2008 > >> root@megatron.mantor.org:/usr/obj/usr/src/sys/GENERIC > >> ACPI APIC Table: <PTLTD APIC > > >> Timecounter "i8254" frequency 1193182 Hz quality 0 > >> CPU: AMD Athlon(tm) MP 2200+ (1800.07-MHz 686-class CPU) > >> Origin = "AuthenticAMD" Id = 0x680 Stepping = 0 > >> > >> Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> > >> > >> AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow!+,3DNow!> > >> real memory = 3220701184 (3071 MB) > >> avail memory = 3150741504 (3004 MB) > >> MADT: Forcing active-low polarity and level trigger for SCI > >> ioapic0 <Version 1.1> irqs 0-23 on motherboard > >> kbd1 at kbdmux0 > >> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, > >> RF5413) > >> hptrr: HPT RocketRAID controller driver v1.1 (Feb 27 2008 07:56:28) > >> acpi0: <PTLTD RSDT> on motherboard > >> acpi0: Power Button (fixed) > >> acpi0: Sleep Button (fixed) > >> Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 > >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 > >> cpu0: <ACPI CPU> on acpi0 > >> acpi_button0: <Power Button> on acpi0 > >> pcib0: <ACPI Host-PCI bridge> port > >> 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0 > >> pci0: <ACPI PCI bus> on pcib0 > >> agp0: <AMD 762 host to AGP bridge> port 0x1810-0x1813 mem > >> 0xf8000000-0xfbffffff,0xf6210000-0xf6210fff at device 0.0 on pci0 > >> pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 > >> pci1: <ACPI PCI bus> on pcib1 > >> isab0: <PCI-ISA bridge> at device 7.0 on pci0 > >> isa0: <ISA bus> on isab0 > >> atapci0: <AMD 768 UDMA100 controller> port > >> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0 > >> ata0: <ATA channel 0> on atapci0 > >> ata1: <ATA channel 1> on atapci0 > >> pci0: <bridge> at device 7.3 (no driver attached) > >> amr0: <LSILogic MegaRAID 1.53> mem 0xf6200000-0xf620ffff irq 20 at > >> device 8.0 on pci0 > >> amr0: delete logical drives supported by controller > >> amr0: <LSILogic PERC 4/DC> Firmware 350O, BIOS 1.09, 128MB RAM > >> ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1000-0x10ff mem > >> 0xf4000000-0xf4000fff irq 20 at device 10.0 on pci0 > >> ahc0: [GIANT-LOCKED] > >> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs > >> ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0x1400-0x14ff mem > >> 0xf4001000-0xf4001fff irq 21 at device 10.1 on pci0 > >> ahc1: [GIANT-LOCKED] > >> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs > >> pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0 > >> pci2: <ACPI PCI bus> on pcib2 > >> ohci0: <OHCI (generic) USB controller> mem 0xf4100000-0xf4100fff irq 19 > >> at device 0.0 on pci2 > >> ohci0: [GIANT-LOCKED] > >> usb0: OHCI version 1.0, legacy support > >> usb0: SMM does not respond, resetting > >> usb0: <OHCI (generic) USB controller> on ohci0 > >> usb0: USB revision 1.0 > >> uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > >> uhub0: 4 ports with 4 removable, self powered > >> pci2: <display, VGA> at device 7.0 (no driver attached) > >> xl0: <3Com 3c980C Fast Etherlink XL> port 0x2400-0x247f mem > >> 0xf4102000-0xf410207f irq 18 at device 8.0 on pci2 > >> miibus0: <MII bus> on xl0 > >> ukphy0: <Generic IEEE 802.3u media interface> on miibus0 > >> ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > >> xl0: Ethernet address: 00:e0:81:22:2e:c4 > >> xl1: <3Com 3c980C Fast Etherlink XL> port 0x2480-0x24ff mem > >> 0xf4102400-0xf410247f irq 19 at device 9.0 on pci2 > >> miibus1: <MII bus> on xl1 > >> ukphy1: <Generic IEEE 802.3u media interface> on miibus1 > >> ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > >> xl1: Ethernet address: 00:e0:81:22:2e:c5 > >> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 > >> atkbd0: <AT Keyboard> irq 1 on atkbdc0 > >> kbd0 at atkbd0 > >> atkbd0: [GIANT-LOCKED] > >> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on > >> acpi0 > >> fdc0: does not respond > >> device_attach: fdc0 attach returned 6 > >> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on > >> acpi0 > >> fdc0: does not respond > >> device_attach: fdc0 attach returned 6 > >> pmtimer0 on isa0 > >> orm0: <ISA Option ROMs> at iomem > >> 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xe0000-0xe3fff on isa0 > >> ppc0: parallel port not found. > >> sc0: <System console> at flags 0x100 on isa0 > >> sc0: VGA <16 virtual consoles, flags=0x300> > >> sio0: configured irq 4 not in bitmap of probed irqs 0 > >> sio0: port may not be enabled > >> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > >> sio0: type 8250 or not responding > >> sio1: configured irq 3 not in bitmap of probed irqs 0 > >> sio1: port may not be enabled > >> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on > >> isa0 > >> Timecounter "TSC" frequency 1800073530 Hz quality 800 > >> Timecounters tick every 1.000 msec > >> hptrr: no controller detected. > >> Waiting 5 seconds for SCSI devices to settle > >> ad0: 476940MB <WDC WD5000AAKB-00UKA0 07.01N01> at ata0-master UDMA100 > >> amr0: delete logical drives supported by controller > >> amrd0: <LSILogic MegaRAID logical drive> on amr0 > >> amrd0: 139900MB (286515200 sectors) RAID 1 (optimal) > >> Trying to mount root from ufs:/dev/amrd0s1a > >> > >> kldstat: > >> > >> Id Refs Address Size Name > >> 1 10 0xc0400000 7a05b0 kernel > >> 2 1 0xc0ba1000 5c304 acpi.ko > >> 3 1 0xc8093000 3000 fdescfs.ko > >> 4 1 0xc8106000 3000 pflog.ko > >> 5 1 0xc8109000 2d000 pf.ko > >> 6 1 0xc817b000 19000 linux.ko > >> > >> If you have any idea or you need more information to diagnosis the > >> problem please let me known. > >> > >> regards, > >> > >> --- > >> Danny Fullerton > >> Mantor Organization > >> _______________________________________________ > >> freebsd-smp@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-smp > >> To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > >> > >> > >> -- > >> No virus found in this incoming message. > >> Checked by AVG Free Edition. > >> Version: 7.5.516 / Virus Database: 269.21.4/1310 - Release Date: > >> 3/4/2008 8:35 AM > >> > >> > > > > _______________________________________________ > > freebsd-smp@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" _______________________________________________ freebsd-smp@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-smp To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org"