閱讀文章 - 看板 DFBSD_kernel

發信人Alex Hornung <ahornung@gmail.com>,

看板DFBSD_kernel

標題Re: Qestions about dsched and fq policy

發信站(null) (Sun Mar 20 15:11:29 2011)

轉信站ptt!crater_reader.dragonflybsd.org!crater.dragonflybsd.org!127.0.0.1.M

On 20 March 2011 05:57, Brills Peng <brillsp@gmail.com> wrote: > 1. The fq scheduler seems to create one thread for each disk, but I found a > piece of comment at fq_core.c:114 indicating that there may be more than one > tdio assigned to a diskctx. I could not find a chance to create such threads > when thinking through the whole procedure of scheduling: > ꀠꀠꀠꀭ The fq_prepare() function creates a dispatcher thread when attaching to > ꀠꀠꀠ乸 disk > ꀠꀠꀠꀭ An io request comes from higher level, and invokes dsched_queue() to > ꀠꀠꀠꀠ沲ind a proper thread to handle this and then calls fq_queue() which may > ꀠꀠꀠꀠ氽ispatch the request or insert it into the thread's queue. Note that > ꀠꀠꀠꀠ濳here is only one thread related is that dispatcher. > ꀠꀠꀠꀭ The dispatcher thread checks the queue when the disk is idle, and tries > ꀠꀠꀠꀠ濳o dispatch the request in the queue. dsched_fq creates two threads per disk, the so-called dispatcher thread (fq_dispatch()) which handles old requests that haven't been taken care of, and the balancer thread fq_balance_thread(), which does around half of the calculations to determine the new transaction limit and so on of each thread/process. BUT: each I/O that comes from a higher level, for example a user program, has its own THREAD CONTEXT and gets its own tdio. Unlike in some other operating systems such as FreeBSD, the thread context of who starts the I/O remains the same throughout the whole process of doing I/O. The tdio passed to the fq_queue() function is unique for that particular process/thread on that disk. In other words, ssh will have one tdio on disk ad0, and so will httpd on that disk; and both of those are other threads than the fq_dispatch() thread. When a thread/process hits its hard transaction limit, its BIOs (I/O requests) get queued for later dispatch in a queue in the tdio structure. There is one such structure for each thread/process and for each disk; and the diskctx structure contains a list of all the tdios for that particular disk. The fq_dispatcher thread only takes care of (some of) the queued requests, others are handled by the original processes/threads themselves during the next scheduling interval when the quota is empty again. > 2. The 'fair share' should relate to processes sending io requests, but I > have no idea on how the implementation in fq_core.c works. Each process > should have its own queue and a budgets to consume but the queue and budgets > seem to be per-thread. If there is only one such thread for each disk, then > the scheduler acts like a FIFO queue. Each process HAS its own queue and budget. As I mentioned before, this is in the tdio (queue is tdio->head.queue and the budget is tdio->max_tp [- tdio->transactions]). The way it works is simply as a feedback controller: After a scheduling interval during which the disk was 100% (or close) busy, a disk-wide budget is calculated. This is done by accumulating the product of average latency * number of transactions of each thread/process. This product is then split up into fair shares for each process. For simplicity's sake, let's assume that all threads/processes have the same priority and that there are 4 threads. So if the final product for the disk was Q_{disk}, Each thread/process gets a piece of cake of size Q_{tdio} = Q_{disk}/4. The actual quota for each thread/process is in number of transactions per second. This is simply Q_{tdio}/average_latency, where average_latency is the average latency of that thread. > 3. Is it possible and proper to implement disk's arm scheduling in dsched? > e.g, a queue sorted by arm position to increase the throughput of FQ > scheduler. Maybe not by disk arm position, but you can access the LBA on the underlying device. Then again I'm not particularly fond of schedulers doing only this; it doesn't offer much improvement and especially not on SSDs. And as Venkatesh mentioned, there is such an implementation already in place a few levels below dsched. > And another questions about the 'Implement further dsched disk scheduling > policies' on GSoC idea list: > DragonFlyBSD has a FQ scheduler right now, but why another policy to > improve interactivity, since the 'fair' guarantees a not bad interactivity? Fact is I am an electronic engineer and not a computer scientist. My approach to the FQ scheduler was an "engineery" one: a simple feedback controller. It was mainly a proof of concept to show how the dsched framework makes scheduling easy and painless. FQ tends to improve the interactivity of the system noticeably, especially when heavy background writer tasks are running such as the regular HAMMER maintenance. But: there is room for a lot of improvement, interactivity is nowhere near as good as it could be. Venkatesh has done some benchmarks with interbench, and the results are not all that promising. It would have been a miracle that a scheduler as simple as FQ performs on par with complex schedulers like Linux' BFQ or CFQ, and reality is that it's nowhere near. > > And the final question: > Justin mentioned an application form on GSoC site, but according to Google's > plan, students' handing in their applications starts at March 29. Should I > send an application to the community in advance and then another to Google > after March 29? You only need to send in one application form, the one to Google. The "form" outlined by Justin just should be included in your proposal, which you submit to Google, as it answers some of the questions we need to deal with. What you should definitely do is discuss your ideas with the community first, but there are no formalities involved in that. Hope that helps, Alex Hornung PS: In this discussion I use thread and process interchangeably. This is because dsched assigns a new tdio for each kernel thread and each userland process, so it is a bit of a mix. The execution unit is of course a thread even for userland processes, but all the threads in a userland process share the same tdio.