閱讀文章 - 看板 FB_smp - 批踢踢實業坊

發信人dillon@apollo.backplane.com (Matthew Dillon),

看板FB_smp

標題Re: Syscall contention tests return, userret() bugs/issues.

發信站NCTU CSIE FreeBSD Server (Fri May 5 12:26:49 2000)

轉信站Ptt!FreeBSD.csie.NCTU!not-for-mail

:> > And do a compative syscall rate test on a two-cpu system running :> > two getuid() processes, this happens: :> > :> > 1 process 2 processes :> > w/PCPU: 1004000 1000000 :> > w/++cnt.v_syscall: 1004000 853000 :> :> But is this a relevant test-case to optimize for ? :> :> We are trying to eliminate all often used trivial syscalls need to get :> into the kernel in the first place, and for non-trivial syscalls it :> doesn't matter a hoot how that increment is done... : :For builds, I would think that the really relevant test case might be a :zero-byte loop-back pipe write / read pair. This would still be a single :process, but would optimize handling of a system call that appears to be :highly relevant to the build process. At least last I heard, Peter had :identified pipe operations (pre-alfredpipe) as being one of the big issues :in a parallel build due to make's use of pipes for IPC in frequent and :small intervals. I don't know if he's run the numbers since then -- :one benefit to moving the Giant grabbing to inside #ifdef ktrace would be :that we might be able to do better benchmarking of the pipe case, which :Alfred has told me hasn't improved much (possibly for this reason), if :only in experimental code. That should demonstrate the performance impact :of the fine-graind locking that we believe should be there. : :That said, if getuid as the example micro-benchmark can be demonstrated to :causally affect optimize the macro-benchmark, then the selection of :micro-benchmark by implementation facility sounds reasonable to me. :-) :Matt's original post used 1-process and 2-process build pairs in a :macro-benchmark style, so I imagine all is set on that front, since he'd :demonstrated that related contention existed in that path, and done :experimentation instrumentation that noted a similar performance impact in :the macro-benchmark from the micro-benchmark. Before commit time comes, :clearly the macros need to run and demonstrate happiness, of couse. : :Robert N M Watson FreeBSD Core Team, TrustedBSD Project :robert@fledge.watson.org NAI Labs, Safeport Network Services Well, I thought I was fairly clear but I'll describe it from a different direction. This benchmark focuses on memory contention occuring in the portion of the system call that is common to ALL system calls ... the lowest level 'critical path' we have in system call management. The benchmark is definitive for this case, and only this case. Using the simplest system call I can find, this benchmark finds all places where contention is occuring in the common system call code and demonstrates not only the degredation that occurs when two cpu's are doing unrelated system calls, but also demonstrates that *ALL* areas of contention in the common code have been found. Additionally, the benchmark demonstrates the effect of cache contention on the stats counters definitively and allows us to theorize, using our knowledge of how memory cache invalidation works, that the contention will be even *WORSE* as we add cpu's (i.e. the existing syscall common code does not scale with the number of cpu's in the system). As a side note, no benchmark focusing on the piping code will be entirely definitive until other areas of contention are fixed first. That isn't to say that we can't clean up the piping code, just that we cannot definitively demonstrate all the remaining areas of contention in the piping code until we've cleaned up the areas of contention in the common syscall code. The pipe code is an obvious next-step to take, after the common path is cleaned up. The common path is an obvious first step, since it effects *ALL* the system calls. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message