Ok, now that the last mess has been dealt with I have resumed my
examination of the critical path.
NOTE!!!!! While I do want to fix this stuff, the patch included below
is *NOT* a fix, it exists for testing purposes only and is missing
things like, oh, necessary memory barriers for certain architectures.
This examination has also turned up two possible bugs in userret(),
which I describe later.
CONTENTION TESTS
My main goal is to try to remove all memory contention from the syscall,
interrupt, and trap critical paths for two processes on an SMP build.
I am focusing on the syscall path right now.
The recent td_ucred stuff fixed some of this. We still have three
pieces of contended memory in the standard syscall path. These are
fairly important because the contention occurs in the critical path
for every system call made.
* the syscall counter
* userret() / CURSIG handling
* userret() / NEEDRESCHED handling
I have rerun my contention tests with the most recent kernel + some
hacks (FOR TESTING PURPOSES ONLY RIGHT NOW!). The kernel was compiled
*without* WITNESS or DIAGNOSTIC.
[------- SMP build -----]
1-Process 2-Process
atomic syscall counter 860836 699591 -18.7% NOTE A
no syscall counter 862822 768512 -10.9% NOTE A
per-cpu syscall counter 882311 786000 -10.9% NOTE A
per-cpu syscall counter 1004050 1001611 -0.2% NOTE B
NOTE A: locking removed from userret for CURSIG case
NOTE B: locking removed from userret for CURSIG *and* NEEDRESCHED test
There are two interesting pieces of information I get out of this
test. First, the difference between the atomic-locked syscall counter
and the per-cpu syscall counter in the one-process case is fairly
minor, but that it blows up in our faces the moment one runs more
then one process. Specifically, that one atomic op results in a 8.5%
loss in performance when two cpu's are contending for the global. This
loss is cumulative so you can imagine the loss in a 4-cpu system.
For some odd reason completely unknown to me, performance actually
increased when I went from *NO* syscall counter to the per-cpu syscall
counter! I have no clue as to why. I tested it several times!
The second interesting piece of information is the performance gain
that occurs when I don't get the sched_lock in userret() to do the
initial NEEDRESCHED test (I'm already not getting Giant in the CURSIG
code in userret() for all of the tests). Performance increases
radically, by almost 16.3%, in the single-process case and, more
importantly, the two-process case shows that there is no longer
any contention between the processors. Not only that, but I think
we can be proud of the fact that it is possible to achieve
sub-1uS syscall overheads under SMP (well, the test box is a 1.1GHz
machine :-)) It means that we are doing something right.
WORKING ON THE ISSUE
My conclusion is that it should be possible for us to remove critical
path contention at least for system calls in fairly short order. I would
like to persue this - that is, find real solutions for the three areas
that my hacks show to be an issue (in regards to system calls).
So I am opening up debate on how to fix these areas:
* system call counter (and counters in the kernel in general)
My current hack does not properly protect the counter. Properly
speaking, due to not being a single instruction, the per-cpu
must be protected by critical_enter()/critical_exit().
I am thinking of adding a PCPU_ADD_INT() macro to all architectures
and putting all 'global' system counters in the per-cpu area, i.e.
embed the struct vmmeter structure in the per-cpu area. sysctl
would combine the information from all the per-cpu areas (i.e. for
systat / vmstat reporting).
( I am not covering things like interface counters here, just global
system counters ).
Comments?
* userret / CURSIG case.
My current hack is somewhat messy. Presumably we will eventually be
able to remove Giant from this path. The question is: When? And
what is the current state of work in that area? Also, it seems to
me that a signal can come in after the check (with or without my
hack) and not get recognized until the next system call or interrupt.
This is bad.
Comments? John?
* userret / NEEDRESCHED case.
This case seems rather odd to me. I understand why we might need to
obtain the sched_lock as a memory barrier but what happens if
NEEDRESCHED is set after we release the sched_lock? Interrupts
are enabled at the point where userret() is called, there is
nothing preventing either a signal or a NEEDRESCHED from occuring
after we've tested it but before we've returned to usermode,
causing us to not honor either until the next interrupt or syscall.
Has anyone grappled with this issue yet?
-Matt
Matthew Dillon
<dillon@backplane.com>
(PATCH USED TO REPRODUCE THE ABOVE TESTS)
Index: i386/i386/trap.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.220
diff -u -r1.220 trap.c
--- i386/i386/trap.c 21 Mar 2002 19:27:15 -0000 1.220
+++ i386/i386/trap.c 29 Mar 2002 19:02:16 -0000
@@ -941,7 +941,14 @@
int args[8];
u_int code;
+ /*
+ * WARNING! FOR TESTING ONLY, per-cpu increment is not protected by
+ * critical_*() (add PCPU_INCR_INT macro?)
+ */
+#if 0
atomic_add_int(&cnt.v_syscall, 1);
+#endif
+ ++*PCPU_PTR(v_syscall);
#ifdef DIAGNOSTIC
if (ISPL(frame.tf_cs) != SEL_UPL) {
Index: kern/subr_trap.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_trap.c,v
retrieving revision 1.212
diff -u -r1.212 subr_trap.c
--- kern/subr_trap.c 21 Mar 2002 06:11:08 -0000 1.212
+++ kern/subr_trap.c 29 Mar 2002 18:43:43 -0000
@@ -72,13 +72,33 @@
struct ksegrp *kg = td->td_ksegrp;
int sig;
+#if 0
mtx_lock(&Giant);
PROC_LOCK(p);
while ((sig = CURSIG(p)) != 0)
postsig(sig);
PROC_UNLOCK(p);
mtx_unlock(&Giant);
+#else
+ PROC_LOCK(p);
+ if ((sig = CURSIG(p)) != 0) {
+ PROC_UNLOCK(p);
+ mtx_lock(&Giant);
+ PROC_LOCK(p);
+ while ((sig = CURSIG(p)) != 0)
+ postsig(sig);
+ PROC_UNLOCK(p);
+ mtx_unlock(&Giant);
+ } else {
+ PROC_UNLOCK(p);
+ }
+#endif
+#if 1
+ if (td->td_priority != kg->kg_user_pri ||
+ (ke->ke_flags & KEF_NEEDRESCHED) ||
+ (p->p_sflag & PS_PROFIL)) {
+#endif
mtx_lock_spin(&sched_lock);
td->td_priority = kg->kg_user_pri;
if (ke->ke_flags & KEF_NEEDRESCHED) {
@@ -106,8 +126,12 @@
ticks = ke->ke_sticks - oticks;
mtx_unlock_spin(&sched_lock);
addupc_task(ke, TRAPF_PC(frame), (u_int)ticks * psratio);
- } else
+ } else {
mtx_unlock_spin(&sched_lock);
+ }
+#if 1
+ }
+#endif
}
/*
Index: sys/pcpu.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/pcpu.h,v
retrieving revision 1.7
diff -u -r1.7 pcpu.h
--- sys/pcpu.h 20 Mar 2002 18:01:52 -0000 1.7
+++ sys/pcpu.h 29 Mar 2002 19:26:39 -0000
@@ -58,6 +58,10 @@
u_int pc_cpuid; /* This cpu number */
u_int pc_cpumask; /* This cpu mask */
u_int pc_other_cpus; /* Mask of all other cpus */
+ /*
+ * XXX embedd vmmeter structure here?
+ */
+ u_int pc_v_syscall; /* system calls counter */
SLIST_ENTRY(pcpu) pc_allcpu;
struct lock_list_entry *pc_spinlocks;
#ifdef KTR_PERCPU
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message