看板 DFBSD_commit 關於我們 聯絡資訊
:... :> prefer NOT to do). I did a quick timing test on sys_set_tls_area() :> and it costs around 339ns on my AMD64 test cube. But this is still :> going to be far higher performing then having to call __tls_get_addr :> all the time. The procedure setup cost for figuring out the GOT offset :> alone is 17ns on the same box. : :It's not about calling __tls_get_addr, but : mov %gs:0, %eax : mov a@NTPOFF(%eax), %eax :vs. : mov $gs:a@NTPOFF, %eax : :The difference is one load instruction with possible a pipe-line stale :involved here. The difference should be zero once the base register is :loaded. : :Joerg There's no pipeline stall there. %gs:0 is likely to ALWAYS be in the L1 cache. The %gs prefix itself can cost time verses a non-prefixed relative load instruction so my guess is that it turns out to be a wash. Also keep in mind that GCC will cache the data loaded from %gs:0, which makes it even less of an issue (and potentially faster then %gs:OFFSET). I did a quick test with both the direct and indirect %gs models and couldn't see any difference in timing. Matthew Dillon <dillon@backplane.com>