In message <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org>, Robe
rt Watson writes:
>That said, if getuid as the example micro-benchmark can be demonstrated to
>causally affect optimize the macro-benchmark, then the selection of
>micro-benchmark by implementation facility sounds reasonable to me. :-)
Well, my gripe with microbenchmarks like this is that they are very
very very hard to get right.
Matt obviously didn't get it right as he himself noticed: one
testcase ran faster despite the fact that it was doing more work.
This means that the behaviour of caches (of all sorts) were a larger
factor than his particular change to the code.
The elimination (practically or by calculation) of the effects of
caches on microbenchmarks is by now a science onto itself.
I am very afraid that we will see people optimize for the cache-footprint
of their microbenchmarks rather than their microbenchmarks themselves.
Remember how Linux optimized for the wrong parameters because of
lmbench ?
We don't want to go there...
The only credible way to get a sensible results from a micro benchmark
that can be extrapolated to macro performance involves adding a
known or predictable, varying entropy load as jitter factor and use
a long integration times (>6hours). That automatically takes you
into the territory of temperature stabilization and atomic referenced
clock signals etc.
And quite frankly, having gone there and come back I can personally
tell you that life isn't long enough for that.
(And no, just disabling caches is not a solution because then your
are not putting the CPU in a representative memory environment
anymore, that's like benchmarking car performance only in 1st gear.
So right now I think that our requirement for doing optimizations
should be:
1. It simplifies the code significantly.
or
2. It carries undisputed theoretical improvement.
or
3. It gives a statistically significant macroscopic improvement
in a (reasonably) well-defined workload of relevance.
The practical guide to execute #3 should be:
A = Time reference code
B = Time modified code
C = Time reference code
D = Time modified code
Unless both A and C are lower than both B and D it will take a lot
of carefully controlled test-runs to prove that there is a statistically
significant improvement (standard deviations and all that...)
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message