看板 FB_smp 關於我們 聯絡資訊
In message <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org>, Robe rt Watson writes: >That said, if getuid as the example micro-benchmark can be demonstrated to >causally affect optimize the macro-benchmark, then the selection of >micro-benchmark by implementation facility sounds reasonable to me. :-) Well, my gripe with microbenchmarks like this is that they are very very very hard to get right. Matt obviously didn't get it right as he himself noticed: one testcase ran faster despite the fact that it was doing more work. This means that the behaviour of caches (of all sorts) were a larger factor than his particular change to the code. The elimination (practically or by calculation) of the effects of caches on microbenchmarks is by now a science onto itself. I am very afraid that we will see people optimize for the cache-footprint of their microbenchmarks rather than their microbenchmarks themselves. Remember how Linux optimized for the wrong parameters because of lmbench ? We don't want to go there... The only credible way to get a sensible results from a micro benchmark that can be extrapolated to macro performance involves adding a known or predictable, varying entropy load as jitter factor and use a long integration times (>6hours). That automatically takes you into the territory of temperature stabilization and atomic referenced clock signals etc. And quite frankly, having gone there and come back I can personally tell you that life isn't long enough for that. (And no, just disabling caches is not a solution because then your are not putting the CPU in a representative memory environment anymore, that's like benchmarking car performance only in 1st gear. So right now I think that our requirement for doing optimizations should be: 1. It simplifies the code significantly. or 2. It carries undisputed theoretical improvement. or 3. It gives a statistically significant macroscopic improvement in a (reasonably) well-defined workload of relevance. The practical guide to execute #3 should be: A = Time reference code B = Time modified code C = Time reference code D = Time modified code Unless both A and C are lower than both B and D it will take a lot of carefully controlled test-runs to prove that there is a statistically significant improvement (standard deviations and all that...) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message