看板 DFBSD_bugs 關於我們 聯絡資訊
On Thu, Apr 21, 2011 at 11:36:10AM -0700, Matthew Dillon wrote: > :> After poking here and the, I think this KKASSERT() can simply go away > :> as proc_remove_zombie() will wait for p->p_lock to drop to zero anyway. > :> > :... > :The following is what I have in my tree. What it does is to hold proc_token > :while waiting for p->p_lock to drop, just as done in proc_remove_zombie(). > :If I don't hold the proc_token as in the first chunk, I see the > : > : proc_remove_zombie: waiting for %p->p_lock to drop > : > :message a few times every hour on the console. I guess it may also > :imply that the race is between a code which holds proc_token for > :a long time but not p->p_token, like all*_scan(). > > It looks good, I would make two changes. One would be to shortcut > the case where p->p_lock is already 0 to avoid unnecessary contention > with proc_token in the critical exit path. > > if (p->p_lock) { > lwkt_gettoken(&proc_token); > while (p->p_lock) > tsleep(p, 0, "reap3", hz); > lwkt_reltoken(&proc_token); > } The problem here is that p->p_lock can become non-zero shortly after being observed as 0 on this reap3 loop (which originally caused the panic on KKASSERT), so I think this shortcut logic almost defeats that purpose. However, even without this shortcut logic, the added kprintf() still very occasionally shows the diagnostic message. > :... > :@@ -661,6 +661,7 @@ proc_remove_zombie(struct proc *p) > : { > : lwkt_gettoken(&proc_token); > : while (p->p_lock) { > :+ kprintf("%s: waiting for %p->p_lock to drop\n", __func__, p); > : tsleep(p, 0, "reap1", hz / 10); > : } > : LIST_REMOVE(p, p_list); /* off zombproc */ > :-- > :1.7.3.2 > > This one may unnecessarily spam the kprintf since the wait is 1/10 > of a second. Maybe conditionalize it with a variable so it only issues > one kprintf(). I'll put it under bootverbose. > With regards to getting rid of the timeout in the tsleep and using a > proactive wakeup(), we have to avoid calling wakeup() for 1->0 > transitions unless someone is known to be waiting on p_lock. This > can be implementing by adding a WAITING flag to the field and using > atomic_cmpset_int() to handle the (WAITING | 1) -> (0) transition and > then calling wakeup() if WAITING was set. > > I will augment the sys/refcount.h API and add refcount_wait() and > refcount_release_wakeup() which encapsulate the appropriate atomic > ops. I will leave it up to you if you want to then use the new API > functions for PHOLD/PRELE, which would give the tsleep case a > proactive wakeup() instead of having to wait for it to timeout. So what I need to do is to change PHOLD/PRELE to use refcount_acquire/ refcount_release_wakeup and replace p->p_lock loop with refcount_release_wakeup? I'll give it a try. Best Regards, YONETANI Tomokazu.