閱讀文章 - 看板 FB_current

發信人scottl@samsco.org (Scott Long),

看板FB_current

標題Re: Exactly that commit (was Re: Latest -current 100% hang at the

發信站NCTU CS FreeBSD Server (Fri Jun 24 07:18:40 2011)

轉信站ptt!csnews.cs.nctu!news.cs.nctu!FreeBSD.cs.nctu!freebsd.org!owner-free

On Jun 23, 2011, at 9:01 PM, Justin T. Gibbs wrote: > On 6/22/11 4:09 PM, Kenneth D. Merry wrote: >> On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote: >> > On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote: >> >> These two are interesting: >> >> >> >>> http://img825.imageshack.us/img825/1249/21062011014m.jpg

>> >>> http://img839.imageshack.us/img839/3791/21062011015.jpg

>> >> >> >> It looks like the GEOM event thread is stuck inside the cd(4)=20 > driver. The >> >> cd(4) driver is trying to acquire the peripheral lock, and is = sleeping >> >> until it gets it. >> >> >> >> What isn't clear is who is holding it. >=20 > ... >=20 >> The GEOM event thread is stuck sleeping in the mtx_sleep() call = above. So >> that tells me that one of several things is going on: >>=20 >> - There is a path in the cd(4) driver where it can call = cam_periph_hold() >> but not cam_periph_unhold(). >>=20 >> - There is another thread in the system that has called = cam_periph_hold(), >> and has gotten stuck before it can call cam_periph_unhold(). >>=20 >> - The hold/unhold logic is broken, and there is a case where a thread >> waiting for the lock can miss the wakeup. After looking at the code, = I >> don't think this is the case, but I may have missed something. >>=20 >> So it is probably one of the first two cases. >=20 > ... >=20 > I have a theory for the cause of this hang. >=20 > The commit that triggers this problem added calls to g_access() during = the > geom_dev probe. I believe this hit a race in cdregister() where > the periph hold lock is dropped around the changer probe code. Why = the > periph hold lock is dropped there, I do not know as I haven't fully > reviewed the changer probe code. >=20 Are you talking about this? disk_create(softc->disk, DISK_VERSION); cam_periph_lock(periph); cam_periph_unhold(periph); [...] if (((cgd->ccb_h.target_lun > 0) && ((softc->quirks & CD_Q_NO_CHANGER) =3D=3D 0)) || ((softc->quirks & CD_Q_CHANGER) !=3D 0)) { The unhold there compliments the hold that was done prior to = disk_create(). The hold/unhold is done as a hack around the need to = drop the periph/sim mutex while calling disk_create(), due to the = later's insistence on using blocking calls. I've wanted to re-think how = that pattern is done (it's the same gross hack in nearly all of the = periph drivers), but haven't gotten around to it. If the 'hold' = semaphore needs to be held longer to prevent the race that you're = theorizing, then it should be possible to simply extend its coverage in = the code block, but I'm not sure if it'll result in an unintended = deadlock with the changer enumeration/matching code. I _think_ that = it'll be ok, but the density of magic in the code is a bit overwhelming = at this time of night =3D-) Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"