[心得] gcc tail call (tail recursion)optimize

作者descent (「雄辯是銀，沉默是金」)

看板C_and_CPP

標題[心得] gcc tail call (tail recursion)optimize

時間Wed Mar 24 21:37:37 2021

[IMG_202011] fig 1. C++ 函數式編程有一陣時間, 沒有去研究一個技術主題之後, 我又回到了 revursive 上, 這是在讀 SICP 之後才注意到的技術。但是 revursive 實在太難, 我還沒能掌握這個思考方式。甚至連 tail call (tail recursive) 都搞不清楚, 後來在閱讀了「C++ 函數式編程 (Functional Programming in C++: How to improve your C++ programs using functional techniques)」(fig 1) 之後, 才知道就是字面上的意思。而只有 tail call 才能讓編譯器有機會轉成 loop, 避免使用 stack。一般看到的文章都說編譯器會把 tail call 轉成 loop, 而不使用 stack 空間, 避免 stack 爆掉, 這是真的嗎? 要怎麼確認呢? 就像 inline function, 我要怎麼確認真的有 inline 呢? 如果編譯器沒有給出明確訊息, 似乎也只能看編譯器輸出的組合語言了。我用了 sum.c 來測試, 這個程式就是把 arr[0] ~ arr[4] 印出來, 只是我用了 recursive function 印出, 而不是用 for loop。這個程式符合 tail call, call 自己的那行程式是在程式的最尾端, 所以應該可以被編譯器最佳化為 loop 才是。但如果沒下什麼編譯選項的話, 是看不到這樣的結果的。 list 1. sum.c 1 #include <stdio.h> 2 3 void print(int *array, int len, int n) 4 { 5 if (n >= len) 6 { 7 return; 8 } 9 else 10 { 11 printf("n: %d, %d\n", n, array[n]); 12 return print(array, len, n+1); 13 } 14 } 15 16 int main(int argc, char *argv[]) 17 { 18 int arr[] = {1,2,3,4,5,}; 19 print(arr, 5, 0); 20 return 0; 21 } 而和 tail recursive calls 有關的編譯選項是 -foptimize-sibling-calls。 -foptimize-sibling-calls Optimize sibling and tail recursive calls. Enabled at levels -O2, -O3, -Os. 但是很奇怪, 如果使用 -foptimize-sibling-calls 反而不會輸出 Optimize tail recursive calls 的組合語言, 需要用 -Os 才會將 tail recursive calls 轉成 loop。 gcc -no-pie -m32 sum.c -g -Os -o sum.tc.opt gcc -no-pie -m32 sum.c -g -foptimize-sibling-calls -o sum.no.tc.opt list 2 L37, 在 print() 裡頭還是會 call print() list 2. sum.no.tc.opt.txt 3 4 sum.no.tc.opt: file format elf32-i386 5 6 7 8 08049162 <print>: 9 8049162: 55 push %ebp 10 8049163: 89 e5 mov %esp,%ebp 11 8049165: 53 push %ebx 12 8049166: 83 ec 04 sub $0x4,%esp 13 8049169: e8 b4 00 00 00 call 8049222 <__x86.get_pc_thunk.ax> 14 804916e: 05 92 2e 00 00 add $0x2e92,%eax 15 8049173: 8b 55 10 mov 0x10(%ebp),%edx 16 8049176: 3b 55 0c cmp 0xc(%ebp),%edx 17 8049179: 7d 43 jge 80491be <print+0x5c> 18 804917b: 8b 55 10 mov 0x10(%ebp),%edx 19 804917e: 8d 0c 95 00 00 00 00 lea 0x0(,%edx,4),%ecx 20 8049185: 8b 55 08 mov 0x8(%ebp),%edx 21 8049188: 01 ca add %ecx,%edx 22 804918a: 8b 12 mov (%edx),%edx 23 804918c: 83 ec 04 sub $0x4,%esp 24 804918f: 52 push %edx 25 8049190: ff 75 10 pushl 0x10(%ebp) 26 8049193: 8d 90 08 e0 ff ff lea -0x1ff8(%eax),%edx 27 8049199: 52 push %edx 28 804919a: 89 c3 mov %eax,%ebx 29 804919c: e8 8f fe ff ff call 8049030 <printf@plt> 30 80491a1: 83 c4 10 add $0x10,%esp 31 80491a4: 8b 45 10 mov 0x10(%ebp),%eax 32 80491a7: 83 c0 01 add $0x1,%eax 33 80491aa: 83 ec 04 sub $0x4,%esp 34 80491ad: 50 push %eax 35 80491ae: ff 75 0c pushl 0xc(%ebp) 36 80491b1: ff 75 08 pushl 0x8(%ebp) 37 80491b4: e8 a9 ff ff ff call 8049162 <print> 38 80491b9: 83 c4 10 add $0x10,%esp 39 80491bc: eb 01 jmp 80491bf <print+0x5d> 40 80491be: 90 nop 41 80491bf: 8b 5d fc mov -0x4(%ebp),%ebx 42 80491c2: c9 leave 43 80491c3: c3 ret 44 45 080491c4 <main>: 46 80491c4: 8d 4c 24 04 lea 0x4(%esp),%ecx 47 80491c8: 83 e4 f0 and $0xfffffff0,%esp 48 80491cb: ff 71 fc pushl -0x4(%ecx) 49 80491ce: 55 push %ebp 50 80491cf: 89 e5 mov %esp,%ebp 51 80491d1: 51 push %ecx 52 80491d2: 83 ec 24 sub $0x24,%esp 53 80491d5: e8 48 00 00 00 call 8049222 <__x86.get_pc_thunk.ax> 54 80491da: 05 26 2e 00 00 add $0x2e26,%eax 55 80491df: c7 45 e4 01 00 00 00 movl $0x1,-0x1c(%ebp) 56 80491e6: c7 45 e8 02 00 00 00 movl $0x2,-0x18(%ebp) 57 80491ed: c7 45 ec 03 00 00 00 movl $0x3,-0x14(%ebp) 58 80491f4: c7 45 f0 04 00 00 00 movl $0x4,-0x10(%ebp) 59 80491fb: c7 45 f4 05 00 00 00 movl $0x5,-0xc(%ebp) 60 8049202: 83 ec 04 sub $0x4,%esp 61 8049205: 6a 00 push $0x0 62 8049207: 6a 05 push $0x5 63 8049209: 8d 45 e4 lea -0x1c(%ebp),%eax 64 804920c: 50 push %eax 65 804920d: e8 50 ff ff ff call 8049162 <print> 66 8049212: 83 c4 10 add $0x10,%esp 67 8049215: b8 00 00 00 00 mov $0x0,%eax 68 804921a: 8b 4d fc mov -0x4(%ebp),%ecx 69 804921d: c9 leave 70 804921e: 8d 61 fc lea -0x4(%ecx),%esp 71 8049221: c3 ret list 3 L65, 在 print() 是用 jmp 回到前面一個點 (L55), 就是 loop 的動作。 list 3. sum.tc.opt.txt 1 2 sum.tc.opt: file format elf32-i386 3 4 5 6 Disassembly of section .text: 7 8 08049050 <main>: 9 8049050: e8 9b 01 00 00 call 80491f0 <__x86.get_pc_thunk.ax> 10 8049055: 05 ab 2f 00 00 add $0x2fab,%eax 11 804905a: 8d 4c 24 04 lea 0x4(%esp),%ecx 12 804905e: 83 e4 f0 and $0xfffffff0,%esp 13 8049061: ff 71 fc pushl -0x4(%ecx) 14 8049064: 55 push %ebp 15 8049065: 89 e5 mov %esp,%ebp 16 8049067: 57 push %edi 17 8049068: 56 push %esi 18 8049069: 8d b0 14 e0 ff ff lea -0x1fec(%eax),%esi 19 804906f: 8d 45 d4 lea -0x2c(%ebp),%eax 20 8049072: 51 push %ecx 21 8049073: 8d 7d d4 lea -0x2c(%ebp),%edi 22 8049076: b9 05 00 00 00 mov $0x5,%ecx 23 804907b: 83 ec 30 sub $0x30,%esp 24 804907e: f3 a5 rep movsl %ds:(%esi),%es:(%edi) 25 8049080: 6a 00 push $0x0 26 8049082: 6a 05 push $0x5 27 8049084: 50 push %eax 28 8049085: e8 28 01 00 00 call 80491b2 <print> 29 804908a: 8d 65 f4 lea -0xc(%ebp),%esp 30 804908d: 31 c0 xor %eax,%eax 31 804908f: 59 pop %ecx 32 8049090: 5e pop %esi 33 8049091: 5f pop %edi 34 8049092: 5d pop %ebp 35 8049093: 8d 61 fc lea -0x4(%ecx),%esp 36 8049096: c3 ret 37 8049097: 66 90 xchg %ax,%ax 38 8049099: 66 90 xchg %ax,%ax 39 804909b: 66 90 xchg %ax,%ax 40 804909d: 66 90 xchg %ax,%ax 41 804909f: 90 nop 42 43 44 080491b2 <print>: 45 80491b2: 55 push %ebp 46 80491b3: 89 e5 mov %esp,%ebp 47 80491b5: 57 push %edi 48 80491b6: 56 push %esi 49 80491b7: 53 push %ebx 50 80491b8: e8 33 ff ff ff call 80490f0 <__x86.get_pc_thunk.bx> 51 80491bd: 81 c3 43 2e 00 00 add $0x2e43,%ebx 52 80491c3: 83 ec 0c sub $0xc,%esp 53 80491c6: 8b 75 10 mov 0x10(%ebp),%esi 54 80491c9: 8d bb 08 e0 ff ff lea -0x1ff8(%ebx),%edi 55 80491cf: 3b 75 0c cmp 0xc(%ebp),%esi 56 80491d2: 7d 14 jge 80491e8 <print+0x36> 57 80491d4: 50 push %eax 58 80491d5: 8b 45 08 mov 0x8(%ebp),%eax 59 80491d8: ff 34 b0 pushl (%eax,%esi,4) 60 80491db: 56 push %esi 61 80491dc: 46 inc %esi 62 80491dd: 57 push %edi 63 80491de: e8 4d fe ff ff call 8049030 <printf@plt> 64 80491e3: 83 c4 10 add $0x10,%esp 65 80491e6: eb e7 jmp 80491cf <print+0x1d> 66 80491e8: 8d 65 f4 lea -0xc(%ebp),%esp 67 80491eb: 5b pop %ebx 68 80491ec: 5e pop %esi 69 80491ed: 5f pop %edi 70 80491ee: 5d pop %ebp 71 80491ef: c3 ret 看到這樣的結果就安心了, 真的是像文章上說的一樣, 有做 tail recursive calls Optimization。 -- 紙上得來終覺淺，絕知此事要躬行。 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 1.200.119.219 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/C_and_CPP/M.1616593067.A.A07.html

推 cuteSquirrel: 推實驗精神 03/24 21:40

推 Lipraxde: 下 -fno-optimize-sibling-calls 倒確實是沒做 TCO。 03/25 23:03

→ Lipraxde: Clang 的 TCO 相關選項也是跟 gcc 一樣怪怪的XD 03/25 23:03