:> > And do a compative syscall rate test on a two-cpu system running
:> > two getuid() processes, this happens:
:> >
:> > 1 process 2 processes
:> > w/PCPU: 1004000 1000000
:> > w/++cnt.v_syscall: 1004000 853000
:>
:> But is this a relevant test-case to optimize for ?
:>
:> We are trying to eliminate all often used trivial syscalls need to get
:> into the kernel in the first place, and for non-trivial syscalls it
:> doesn't matter a hoot how that increment is done...
:
:For builds, I would think that the really relevant test case might be a
:zero-byte loop-back pipe write / read pair. This would still be a single
:process, but would optimize handling of a system call that appears to be
:highly relevant to the build process. At least last I heard, Peter had
:identified pipe operations (pre-alfredpipe) as being one of the big issues
:in a parallel build due to make's use of pipes for IPC in frequent and
:small intervals. I don't know if he's run the numbers since then --
:one benefit to moving the Giant grabbing to inside #ifdef ktrace would be
:that we might be able to do better benchmarking of the pipe case, which
:Alfred has told me hasn't improved much (possibly for this reason), if
:only in experimental code. That should demonstrate the performance impact
:of the fine-graind locking that we believe should be there.
:
:That said, if getuid as the example micro-benchmark can be demonstrated to
:causally affect optimize the macro-benchmark, then the selection of
:micro-benchmark by implementation facility sounds reasonable to me. :-)
:Matt's original post used 1-process and 2-process build pairs in a
:macro-benchmark style, so I imagine all is set on that front, since he'd
:demonstrated that related contention existed in that path, and done
:experimentation instrumentation that noted a similar performance impact in
:the macro-benchmark from the micro-benchmark. Before commit time comes,
:clearly the macros need to run and demonstrate happiness, of couse.
:
:Robert N M Watson FreeBSD Core Team, TrustedBSD Project
:robert@fledge.watson.org NAI Labs, Safeport Network Services
Well, I thought I was fairly clear but I'll describe it from a
different direction. This benchmark focuses on memory contention
occuring in the portion of the system call that is common to ALL
system calls ... the lowest level 'critical path' we have in system
call management. The benchmark is definitive for this case, and only
this case. Using the simplest system call I can find, this benchmark
finds all places where contention is occuring in the common system call
code and demonstrates not only the degredation that occurs when two
cpu's are doing unrelated system calls, but also demonstrates that
*ALL* areas of contention in the common code have been found.
Additionally, the benchmark demonstrates the effect of cache contention
on the stats counters definitively and allows us to theorize, using
our knowledge of how memory cache invalidation works, that the
contention will be even *WORSE* as we add cpu's (i.e. the existing
syscall common code does not scale with the number of cpu's in the
system).
As a side note, no benchmark focusing on the piping code will be
entirely definitive until other areas of contention are fixed first.
That isn't to say that we can't clean up the piping code, just that
we cannot definitively demonstrate all the remaining areas of contention
in the piping code until we've cleaned up the areas of contention in
the common syscall code. The pipe code is an obvious next-step to
take, after the common path is cleaned up. The common path is an
obvious first step, since it effects *ALL* the system calls.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message