On 21.07.2010, at 10:33, Andriy Gapon wrote:
> on 21/07/2010 03:57 Markus Gebert said the following:
>> Another thing though: Today I compared verbose boot output from =
8-stable and
>> the current box. I saw that the ioapic sets up IRQ routing =
differently on
>> these two systems although the hardware is the same. This seemed not =
so
>> interesting at first, but then I noticed that 8-stable sets up two =
routes (to
>> lapic0 and lapic2, or sometimes lapic3) for IRQ58 (mpt0), while =
current only
>> uses one route (to lapic0).
>=20
> My understanding that it's not "two routes", but re-routing.
> During early boot all interrupts are bound to BSP; later, when APs =
become
> online, the interrupts are re-distributed among available CPUs.
I guess you're right, misinterpretation on my side. Thanks for =
clarifying this.
Now being aware of this, it seems to me that in the =
machdep.lapic_allclocks=3D0 case, there might just be more interrupts to =
be assigned/routed due to "more clocks being used". If that's true, =
maybe it's just "luck" that in this case the mpt interrupt gets assigned =
to lapic0/cpu0 and the box runs fine. I'm just guessing though, since I =
have no clue how interrupts are assigned to lapics exactly (round-robin? =
some logic?).
>> I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box =
behave
>> like the one running current. Indeed, this seems to have changed =
IRQ58 to be
>> routed to lapic0 only. And the box was running for hours without =
showing the
>> symptoms.
>>=20
>> I just checked boot verbose outpout of my 8-stable box again (booted =
with
>> machdep.lapic_allclocks=3D0 as mentioned above). And now it seems to =
have set
>> up IRQ routes just like the current box (one route for IRQ58 to =
lapic0).
>=20
> Not sure how to interpret this properly.
> One possibility is a hardware problem where interrupt message route =
between
> ioapic2 and CPU to which lapic3 belongs is flaky.
> Perhaps, this might be a FreeBSD problem: it could be that the system =
somehow
> tells to not set up such routes, but we don't listen. But this is far =
fetched.
I'm not sure either. If my "theory" above proved to be true, it would =
have been just luck, that 6.x and 7.x (and current) run just fine on the =
X4100M2. A (short) test on Ubuntu didn't trigger the problem, so the =
Linux kernel is either lucky too by selecting an interrupt route that is =
"not flaky", or there's indeed some way to figure out not to use some =
lapics for some interrupts. Or we didn't test Linux thoroughly enough.
Markus
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"