看板 DFBSD_kernel 關於我們 聯絡資訊
All, During my 2010 GSoC Project, I re-implemented the select(2) and poll(2) kernel interfaces on top of the kevent(2) interface. This project went mostly as-expected, but having had some time to settle a number of issues and additional opportunities have arisen. This project aims to bring the rework of the DragonFly kernel file descriptor event subsystems to its logical conclusion, fix any lingering bugs and generally put the code in maintainable and portable shape. The first opportunity is to restructure the knote structure (kernel struct knote). Currently, when select(2), poll(2) or kevent(2) registers an interest in read, write or exceptional conditions (filters) on a descriptor the internal kqueue_register() function is called which looks up the requested knote based on the file descriptor and the filter type. struct knote { SLIST_ENTRY(knote) kn_link; /* for fd */ TAILQ_ENTRY(knote) kn_kqlink; /* for kq_knlist */ SLIST_ENTRY(knote) kn_next; /* for struct kqinfo */ TAILQ_ENTRY(knote) kn_tqe; /* for kq_head */ struct kqueue *kn_kq; /* which queue we are on */ struct kevent kn_kevent; int kn_status; int kn_sfflags; /* saved filter flags */ intptr_t kn_sdata; /* saved data field */ union { struct file *p_fp; /* file data pointer */ struct proc *p_proc; /* proc pointer */ } kn_ptr; struct filterops *kn_fop; caddr_t kn_hook; }; Doing things in this fashion is perfectly acceptable for the kevent(2) interface, as read, write and exceptional conditions are logically separated. For the poll(2), select(2) and future epoll(2)/devpoll(2) interfaces it is problematic. For these interfaces the conditions are more closely tied to the descriptor. The code implementing select(2) and poll(2) at present must do a lot of extra work to return combined filter results. The epoll(2) interface would be extremely problematic to implement in this manner, potentially requiring full scans of all registered knotes. The logical step here seems to be to restructure the knote structure in such a fashion that filters are grouped based on the file descriptor. Currently I am thinking something like the following: /* kqueue, select, poll, epoll, devpoll (5) */ #define KEVENT_INTERFACES 5 SLIST_HEAD(klist, knote) struct knote { SLIST_ENTRY(knote) kn_link; /* for fd */ TAILQ_ENTRY(knote) kn_kqlink; /* for kq_knlist */ SLIST_ENTRY(knote) kn_next; /* for struct kqinfo */ TAILQ_ENTRY(knote) kn_tqe; /* for kq_head */ struct kqueue *kn_kq; union { struct file *p_fp; /* file data pointer */ struct proc *p_proc; /* proc pointer */ } kn_ptr; struct knote_data *notes[KEVENT_INTERFACES]; } struct knote_data { struct kevent kn_kevent; int kn_status; int kn_sfflags; /* saved filter flags */ intptr_t sn_sdata; /* saved data field */ struct filterops *kn_fop; caddr_t kn_hook; } In this or a similar structure would allow the other implementations (select(2)/poll(2)) to do a great deal less work, allow for the relatively easy implementation of epoll(2) and has virtually no consequence to the speed or robustness of the original kqueue implementation. ***** [BOLD, READ ME, PLEASE RESPOND, ETC.] As the rest of the work hangs off of what is described above, it would be fantastic to hear any other thoughts/ideas/viewpoints on how to accomplish splitting or otherwise reworking struct knote. ***** After restructuring kevent to function as described above and updating the existing select(2) and poll(2) implementations, I would like to implement epoll(2) replacing the current Linux implementation as well as /dev/poll. This would uniquely differentiate DragonFly as the only platform to support all of the major i/o multiplexing schemes natively. The next most pressing issue with the kevent implementation as it stands in DragonFly (and the other BSD's as well) is one of robustness. The beauty of kevent is that it is stateful and stateful means fast. The drawback of this is that there are many lists and queues that must be precisely managed in order for things not to go awry. This is evidenced by, for example, the ums usb mouse detach bug: http://bugs.dragonflybsd.org/issue1873 As it stands, all kqueue/kevent implementations suffer from this problem. The most reasonable solution, apart from going back to a stateless approach like the former kernel poll support, is to limit the direct list handling as much as possible and force kqueue filter implementers to do the right thing where possible. I would propose to extend devfs with a new API for filter implementers that manages most aspects of device bring up, teardown and knote management for all devices which feature a devfs device node. Reference: http://leaf.dragonflybsd.org/mailarchive/kernel/2011-02/msg00006.html As a next step I would like to take this approach of centralized control of the lists and data structures even further by extending the setup and teardown API's offered to the filters. As a part of this I foresee making the various structures used by kqueue be allocated and fully managed by the kqueue subsystem, especially the destruction of any used objects should only be done after the kqueue subsystem is certain that there are no more consumers. Reference: http://leaf.dragonflybsd.org/mailarchive/kernel/2011-02/msg00008.html There are several open bug reports that are obviously problems with the current kqueue implementation or the select/poll wrappers. Once the bulk of the above work is complete I believe it would be prudent to both attempt to reproduce these bugs to see if they still exist and attempt to fix them if at all possible. Additional testing should also be done to ensure that no regressions were introduced and that the epoll and /dev/poll interfaces work as-expected with software that implements support for them. References: http://bugs.dragonflybsd.org/issue1730 http://bugs.dragonflybsd.org/issue1998 http://bugs.dragonflybsd.org/issue2011 http://bugs.dragonflybsd.org/issue2028 Finally, I would like to document the finalized filter implementation API and a porting guide for kernel developers, to ease bringing in drivers from other platforms. Best, Sam