看板 LinuxDev 關於我們 聯絡資訊
正在試著評估RPi中如果用mlockall把memory鎖住會不會改善latency 用著名的cyclictest (v0.92)+perf得到以下結果: sudo perf stat ./cyclictest -p 90 - m -c 0 -i 3000 -n -h 250 -q -l 10000 # Total: 000009985 # Min Latencies: 00038 # Avg Latencies: 00082 # Max Latencies: 00386 # Histogram Overflows: 00015 Performance counter stats for './cyclictest -p 90 -m -c 0 -i 3000 -n -h 250 -q -l 10000': 818.925000 task-clock (msec) # 0.027 CPUs utilized 13,362 context-switches # 0.016 M/sec 0 cpu-migrations # 0.000 K/sec 56 page-faults # 0.068 K/sec 471,078,551 cycles # 0.575 GHz (50.34%) 282,495,112 stalled-cycles-frontend # 59.97% frontend cycles idle (51.67%) 13,419,172 stalled-cycles-backend # 2.85% backend cycles idle (52.93%) 68,489,877 instructions # 0.15 insns per cycle # 4.12 stalled cycles per insn (38.41%) 7,553,254 branches # 9.223 M/sec (30.02%) 1,627,813 branch-misses # 21.55% of all branches (34.01%) 30.232651000 seconds time elapsed 如果不加-m參數(不用mlockall): sudo perf stat ./cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 10000 # Total: 000009988 # Min Latencies: 00038 # Avg Latencies: 00080 # Max Latencies: 00407 # Histogram Overflows: 00012 Performance counter stats for './cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 10000': 772.978000 task-clock (msec) # 0.026 CPUs utilized 13,363 context-switches # 0.017 M/sec 0 cpu-migrations # 0.000 K/sec 66 page-faults # 0.085 K/sec 444,135,743 cycles # 0.575 GHz (41.26%) 271,762,254 stalled-cycles-frontend # 61.19% frontend cycles idle (48.87%) 8,522,179 stalled-cycles-backend # 1.92% backend cycles idle (56.53%) 65,640,536 instructions # 0.15 insns per cycle # 4.14 stalled cycles per insn (37.62%) 7,453,674 branches # 9.643 M/sec (34.44%) 1,584,489 branch-misses # 21.26% of all branches (25.24%) 30.197211000 seconds time elapsed 看起來Max latencies會因為-m變小一點 我的問題在於,page-faults只有因為-m變稍小一點,並沒有完全解決 請問這是正常的嗎?我還以為mlockall住就不會有PF了。 感謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 90.41.67.118 ※ 文章網址: https://www.ptt.cc/bbs/LinuxDev/M.1446054583.A.EB6.html
yvb: 光載入程式本身text和libs, 就會發生很多次 page-faults 了. 10/29 16:24
wtchen: 所以除了一開始initialize的部份以外,就不會再有PF了嗎? 10/29 19:28
做了個實驗:把loop提高10倍看PF次數有沒有提高 有mlockall的情況下:page-faults維持在55-56沒增加 Performance counter stats for './cyclictest -p 90 -m -c 0 -i 3000 -n -h 250 -q -l 100000': 7202.248000 task-clock (msec) # 0.024 CPUs utilized 130,818 context-switches # 0.018 M/sec 0 cpu-migrations # 0.000 K/sec 55 page-faults # 0.008 K/sec 4,079,431,733 cycles # 0.566 GHz (48.12%) 2,569,771,515 stalled-cycles-frontend # 62.99% frontend cycles idle (49.99%) 69,883,756 stalled-cycles-backend # 1.71% backend cycles idle (51.78%) 643,633,565 instructions # 0.16 insns per cycle # 3.99 stalled cycles per insn (34.40%) 72,253,517 branches # 10.032 M/sec (32.91%) 15,166,468 branch-misses # 20.99% of all branches (31.47%) 300.240982143 seconds time elapsed 沒有mlockall:page-faults維持在66-67 Performance counter stats for './cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 100000': 7181.634000 task-clock (msec) # 0.024 CPUs utilized 130,892 context-switches # 0.018 M/sec 0 cpu-migrations # 0.000 K/sec 67 page-faults # 0.009 K/sec 4,072,629,665 cycles # 0.567 GHz (49.76%) 2,537,027,318 stalled-cycles-frontend # 62.29% frontend cycles idle (49.79%) 70,191,503 stalled-cycles-backend # 1.72% backend cycles idle (50.05%) 627,997,620 instructions # 0.15 insns per cycle # 4.04 stalled cycles per insn (34.31%) 71,914,012 branches # 10.014 M/sec (33.07%) 15,190,645 branch-misses # 21.12% of all branches (33.44%) 300.195795144 seconds time elapsed 看起來loop增加並沒有增加page-faults... (不管有無mlockall) ※ 編輯: wtchen (90.41.214.241), 10/29/2015 19:46:17 ※ 編輯: wtchen (90.41.214.241), 10/29/2015 19:51:13
yvb: ...... 你認為什麼情況下會發生 page fault ? 10/29 21:58
wtchen: 我以為當process因為sleep或time slice超過後 10/30 04:00
wtchen: 被swap,之後重新回到memory才會有page fault的動作 10/30 04:01
wtchen: mlockall我看man,他的功用是 10/30 04:02
wtchen: preventing that memory from being paged to the swap 10/30 04:02
wtchen: 所以我以為mlockall = no swap 10/30 04:04
yvb: 你可能把 swapping (paging) 和 context switching 搞混了... 10/30 16:51
yvb: 要不要看一下 wikipedia 的資料, 或用 google 確認一下差別? 10/30 16:51
final01: page fault是應該減少沒錯,可是cold page fault無法免 10/31 00:00
wtchen: 我有一點混淆沒錯,不過我的用意是不要loop到一半 10/31 00:06
wtchen: sleep的時候variable被丟到swap,結果sleep完 10/31 00:07
wtchen: 要找variable找不到而發生page fault 10/31 00:07
wtchen: 這樣從swap->RAM就要浪費時間load,使得timing不準確 10/31 00:08
yvb: 除非主記憶體不足, kernel 不會沒事亂搞 swapping... 11/07 05:35
yvb: 至於 timing 準不準確, 得看需要的精確度有多高... 11/07 05:36
yvb: 不同 CPU 做 context switching 的 overhead 也不同. 11/07 05:37