看板 DFBSD_kernel 關於我們 聯絡資訊
On March 19th Alex Hornung wrote: > Both dsched and dm provide some nice abstractions that will > allow you to focus more on your project itself than messing with a lot > of our kernel internals. DSched's interface for writing disk schedulers is very straightforward; you provide 4 methods for any disk scheduler: *[1] 1) prepare(): Prepare is invoked any time a disk scheduler is set for a disk. Depending on how complex your scheduler is, it can range from doing nothing (see noop_prepare() in kern/kern_dsched.c) to doing anything complex you want (see fq_prepare in kern/dsched/fq/fq_diskops.c) such as starting kernel threads and allocating stuff and whatnot. 2) teardown(): called when a disk scheduler is being shut down or switched away from, this reverses the work of prepare(). It can deallocate things, kill threads, whatnot. 3) cancel_all(): cancel_all() should remove all queued I/O requests to a given disk instead of issuing them. 4) queue(): This is the most interesting routine! Queue accepts a single I/O request from a thread to a certain disk and does whatever work is needed to make that request either happen immediately or defer it to some point in the future. int queue(struct dsched_disk_ctx *disk, struct dsched_thread_io *tdio, struct bio *bio) {} 'disk' is a pointer to a per-disk structure; 'dsched_disk_ctx' just contains a reference to the disk it is controlling along with a few other fields (a refcount, some list linkages, ...). 'tdio' is a reference to the thread performing the disk I/O request. It provides a place to queue BIOs and a few other references. And 'bio' is the I/O request -- it holds a lot of information about the request -- whether it is a read, write, or any one of the other commands in sys/buf.h (see BUF_CMD_*), what the request data is, etc. A queue() routine can completely pass the buck -- if it returns a nonzero value, the underlying device is handed the BIO directly. :) DSched sits in the middle of a fairly deep stack -- below it sits a routine called bioqdisksort(), which also shuffles I/O requests a bit; bioqdisksort() lives in kern/subr_disk.c, if you'd like to take a look. In theory stacking a disk scheduler on top of a code sorting reads-ahead-of-writes-(mostly) is a bad idea. In practice, I've found performance changes (a mix of good and bad) when disabling it and turning on Fair Queuing... no idea why though yet. Anyway writing a dsched module really is straightforward! The difficult part is actually all inside queue() -- deciding when to queue an I/O request versus issue it directly and how to queue the requests. We currently have two disk schedulers in kernel -- No-Op does no queueing and can be found inside kern/kern_dsched.c (look for the functions and structures starting with noop_), and Fair Queuing, which implements something of a fair queuing/fair share approach. In FQ, threads will directly issue I/O till they hit a fair share limit; when they do, they queue the I/O requests for a worker thread (fq_dispatcher) to issue at a later time. There are some solid references to what other OSes are doing here or what research groups have tinkered with, if that is interesting: (BSD specific): * http://www.happyemi.org/hybrid/guide.html and http://web.archive.org/web/20060821124302/wikitest.freebsd.org/Hybrid ** These two pages describe an old attempt at adding a disk scheduler to FreeBSD 4.x+; their entry points look rather similar to DSched, except their queue()-analogue doesn't issue I/O; disks instead call get_first() to get requests. They implemented a simple scheduler called 'Hybrid', which is potentially interesting. (Linux) * http://retis.sssup.it/~fabio/linux/bfq/description.php and http://algo.ing.unimo.it/people/paolo/disk_sched/ ** This is a pretty interesting and fairly new disk scheduler for Linux; it is another fair queuing scheduler, but it uses a weighted FQ variant and uses sector counts rather than byte counts for its budget. It has some snazzy features, like low-latency guarantees and minimum bandwidth guarantees. There is probably work in understanding how well our current FQ scheduler works, how it can be improved, how the dsched interfaces themselves could be improved and could play nicer with the surrounding system. There is also plenty of scope for writing new, better disk schedulers. Hope this made some sense! > The good thing about projects in these areas is that you can actually > do the development on a vkernel, if you so like. It's very convenient > to do so as you can simply gdb into the kernel instead of getting a > core dump, and the reboot time is also cut :) I'm really going to second this -- in under 5 minutes using vkernels on leaf.dragonflybsd.org, I can have a fresh copy of the dragonfly sources, apply a patch, build and run a kernel, and boot into it, without ever touching a second machine. If you want to work in some VM (qemu or VBox or w/e), that would be fine; running on real hardware would work just fine also. Good luck! -- vs *[1]: There are actually a few more entry points, but no scheduler currently takes advantage of them.