顯示具有 cpp_runtime 標籤的文章。 顯示所有文章
顯示具有 cpp_runtime 標籤的文章。 顯示所有文章

2019年12月27日 星期五

c++ exception handling 的實作 (3) - __gxx_personality_v0

c++ 的 exception handling 特性真是鬼斧神工之作, 這麼複雜的實作機制, 最初到底是怎麼想到的?

難, 難上加難
一樣介紹的是 gcc 的實作方式, 使用 dwarf 格式, 不是 setjmp/longump 那套。

eh.cpp
135 int one()
136 {
145   throw 100;
146   #if 0
147   void *throw_obj = __cxa_allocate_exception(sizeof(int));
148   *(int *)throw_obj = 98;
149   __cxa_throw (throw_obj, &typeid(int), 0);
150   #endif
151 }
170 
171 void main()
172 {
173   try
174   {
180     one();
182   }
184   catch (std::exception &e)
185   {
186     printf("get excetption: %s\n", e.what());
187   }
189   catch (int a)
191   {
192     printf("got excetption: %d\n", a);
193   }
198 }

eh.cpp 是一個很簡單的範例, 表面上看起來很簡單的 try/catch statement, 背後蘊藏著超級複雜的實作機制。可以看到 list 2 反組譯被插入一些額外的程式碼, _Unwind_Resume, __cxa_begin_catch, __cxa_end_catch, 由於這個例子很簡單, 被插入的額外程式碼不多。

這次要介紹的是 __gxx_personality_v0(), 從 throw 100 到 catch(int) 之間, 會發生很多事情, stack unwind 應該是大家比較熟知的部份, 而 __gxx_personality_v0() 可能較為陌生。

在 stack unwind 期間, 有一段程式碼就是在呼叫 __gxx_personality_v0(), 而 __gxx_personality_v0() 會被呼叫 2 次, 類似 list 1 那樣。

list 1
1 code = (*fs.personality) (1, _UA_SEARCH_PHASE, exc->exception_class, exc, &cur_context);

2 code = (*fs.personality) (1, _UA_CLEANUP_PHASE | match_handler, exc->exception_class, exc, context);

eh_throw.cc L83 _Unwind_RaiseException() 會呼叫 2 次 __gxx_personality_v0(), list 1 L1 第一次呼叫 __gxx_personality_v0() 傳入 _UA_SEARCH_PHASE, 這次的目的是找出 landing_pad, landing_pad 的值會是 list 2 L357, 0x10048a, 什麼是 landing_pad, 簡單來說就是在 throw 100 之後, 程式要往哪裡執行呢? 那個位址就是 landing_pad, 計算出這個位址之後還沒完, 要比對丟出的物件 int 和 catch statement 的物件有沒有吻合, eh.cpp 有 2 個 catch statement, 顯然是第二個 catch (int a) 才吻合。

list 1 L1 第二次呼叫 __gxx_personality_v0() 傳入 _UA_CLEANUP_PHASE, 這次的目的是跳到 landing_pad 去執行, 所以在執行完之後, 程式會跳到 0x10048a, 神奇吧!

這個跳躍使用的是 libunwind 的函式, 很複雜, 沒能搞懂其實作, 大概是這樣:

_Unwind_RaiseException() 會呼叫 uw_install_context (&this_context, &cur_context, frames);

#define uw_install_context(CURRENT, TARGET, FRAMES)                     \
  do                                                                    \
    {                                                                   \
      long offset = uw_install_context_1 ((CURRENT), (TARGET));         \
      void *handler = uw_frob_return_addr ((CURRENT), (TARGET));        \
      _Unwind_DebugHook ((TARGET)-c>fa, handler);                       \
      _Unwind_Frames_Extra (FRAMES);                                    \
      __builtin_eh_return (offset, handler);                            \
    }                                                                   \
  while (0)

uw_install_context 是個 macro, 之後就會跳回 landing_pad 的位址。

跳到 0x10048a 之後, 再來就是根據 cmp 判斷式, 來執行 catch (int a) 這段程式。可以參考 list 2 L358 ~ 361, rdx 為 2 的時候, 就是執行 catch (int a) 這段程式, 而 rdx 1 是執行 catch (std::exception)。

這當然不是巧合, 而是從 throw 100 開始一連串精心安排的結果, 一樣沒能搞懂這部份。總之這是 c++ 編譯器的精心傑作。

那如果沒有 catch(int) 呢? __gxx_personality_v0() 只會執行一次, 因為沒找到對應的 catch handle, 這時候會執行 eh_throw.cc L88 的 std::terminate(), 結束整個程式。

eh_throw.cc
  1 // -*- C++ -*- Exception handling routines for throwing.
 70 
 71 extern "C" void
 72 __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
 73                       void (*dest) (void *))
 74 {
 75   __cxa_eh_globals *globals = __cxa_get_globals ();
 76   globals->uncaughtExceptions += 1;
 77 
 78   // Definitely a primary.
 79   __cxa_refcounted_exception *header =
 80     __cxa_init_primary_exception(obj, tinfo, dest);
 81   header->referenceCount = 1;
 82 
 83   _Unwind_RaiseException (&header->exc.unwindHeader);
 84 
 85   // Some sort of unwinding error.  Note that terminate is a handler.
 86   __cxa_begin_catch (&header->exc.unwindHeader);
 87 
 88   std::terminate ();
 89 }
 90 

list 2. objdump -DC eh.elf
   301 
   302 00000000001003d2 <one()>:
   303   1003d2: 55                    push   %rbp
   304   1003d3: 48 89 e5              mov    %rsp,%rbp
   305   1003d6: bf 04 00 00 00        mov    $0x4,%edi
   306   1003db: e8 80 77 00 00        callq  107b60 <__cxa_allocate_exception>
   307   1003e0: c7 00 64 00 00 00     movl   $0x64,(%rax)
   308   1003e6: ba 00 00 00 00        mov    $0x0,%edx
   309   1003eb: be 48 84 10 00        mov    $0x108448,%esi
   310   1003f0: 48 89 c7              mov    %rax,%rdi
   311   1003f3: e8 6c 24 00 00        callq  102864 <__cxa_throw>
   337 
   338 0000000000100441 <main>:
   339   100441: 55                    push   %rbp
   340   100442: 48 89 e5              mov    %rsp,%rbp
   341   100445: 53                    push   %rbx
   342   100446: 48 83 ec 28           sub    $0x28,%rsp
   343   10044a: 48 89 7d d8           mov    %rdi,-0x28(%rbp)
   344   10044e: c7 45 e0 63 00 00 00  movl   $0x63,-0x20(%rbp)
   345   100455: 8b 5d e0              mov    -0x20(%rbp),%ebx
   346   100458: e8 13 01 00 00        callq  100570 <f()>
   347   10045d: 89 c1                 mov    %eax,%ecx
   348   10045f: 8b 15 ef 2b 03 00     mov    0x32bef(%rip),%edx        # 133054 <def>
   349   100465: 8b 05 e5 2b 03 00     mov    0x32be5(%rip),%eax        # 133050 <abc>
   350   10046b: 41 89 d9              mov    %ebx,%r9d
   351   10046e: 41 b8 00 00 00 00     mov    $0x0,%r8d
   352   100474: 89 c6                 mov    %eax,%esi
   353   100476: bf 68 80 10 00        mov    $0x108068,%edi
   354   10047b: b8 00 00 00 00        mov    $0x0,%eax
   355   100480: e8 20 05 00 00        callq  1009a5 <printf(char const*, ...)>
   356   100485: e8 48 ff ff ff        callq  1003d2 <one()>
   357   10048a: e9 9e 00 00 00        jmpq   10052d <main+0xec>
   358   10048f: 48 83 fa 01           cmp    $0x1,%rdx
   359   100493: 74 0e                 je     1004a3 <main+0x62>
   360   100495: 48 83 fa 02           cmp    $0x2,%rdx
   361   100499: 74 44                 je     1004df <main+0x9e>
   362   10049b: 48 89 c7              mov    %rax,%rdi
   363   10049e: e8 5d 5a 00 00        callq  105f00 <_Unwind_Resume>
   364   1004a3: 48 89 c7              mov    %rax,%rdi
   365   1004a6: e8 66 25 00 00        callq  102a11 <__cxa_begin_catch>
   366   1004ab: 48 89 45 e8           mov    %rax,-0x18(%rbp)
   367   1004af: 48 8b 45 e8           mov    -0x18(%rbp),%rax
   368   1004b3: 48 8b 00              mov    (%rax),%rax
   369   1004b6: 48 83 c0 10           add    $0x10,%rax
   370   1004ba: 48 8b 00              mov    (%rax),%rax
   371   1004bd: 48 8b 55 e8           mov    -0x18(%rbp),%rdx
   372   1004c1: 48 89 d7              mov    %rdx,%rdi
   373   1004c4: ff d0                 callq  *%rax
   374   1004c6: 48 89 c6              mov    %rax,%rsi
   375   1004c9: bf a4 80 10 00        mov    $0x1080a4,%edi
   376   1004ce: b8 00 00 00 00        mov    $0x0,%eax
   377   1004d3: e8 cd 04 00 00        callq  1009a5 <printf(char const*, ...)>
   378   1004d8: e8 19 26 00 00        callq  102af6 <__cxa_end_catch>
   379   1004dd: eb 4e                 jmp    10052d <main+0xec>
   380   1004df: 48 89 c7              mov    %rax,%rdi
   381   1004e2: e8 2a 25 00 00        callq  102a11 <__cxa_begin_catch>
   382   1004e7: 8b 00                 mov    (%rax),%eax
   383   1004e9: 89 45 e4              mov    %eax,-0x1c(%rbp)
   384   1004ec: 8b 45 e4              mov    -0x1c(%rbp),%eax
   385   1004ef: 89 c6                 mov    %eax,%esi
   386   1004f1: bf b8 80 10 00        mov    $0x1080b8,%edi
   387   1004f6: b8 00 00 00 00        mov    $0x0,%eax
   388   1004fb: e8 a5 04 00 00        callq  1009a5 <printf(char const*, ...)>
   389   100500: e8 f1 25 00 00        callq  102af6 <__cxa_end_catch>
   390   100505: eb 26                 jmp    10052d <main+0xec>
   391   100507: 48 89 c3              mov    %rax,%rbx
   392   10050a: e8 e7 25 00 00        callq  102af6 <__cxa_end_catch>
   393   10050f: 48 89 d8              mov    %rbx,%rax
   394   100512: 48 89 c7              mov    %rax,%rdi
   395   100515: e8 e6 59 00 00        callq  105f00 <_Unwind_Resume>
   396   10051a: 48 89 c3              mov    %rax,%rbx
   397   10051d: e8 d4 25 00 00        callq  102af6 <__cxa_end_catch>
   398   100522: 48 89 d8              mov    %rbx,%rax
   399   100525: 48 89 c7              mov    %rax,%rdi
   400   100528: e8 d3 59 00 00        callq  105f00 <_Unwind_Resume>
   401   10052d: 48 83 c4 28           add    $0x28,%rsp
   402   100531: 5b                    pop    %rbx
   403   100532: 5d                    pop    %rbp
   404   100533: c3                    retq   
   405 
 15543 
 15548 Disassembly of section .gcc_except_table:
 15549 
 15550 000000000010b704 <.gcc_except_table>:
 15551   10b704: ff                    (bad)  
 15552   10b705: ff 01                 incl   (%rcx)
 15553   10b707: 00 ff                 add    %bh,%bh
 15554   10b709: ff 01                 incl   (%rcx)
 15555   10b70b: 0c 10                 or     $0x10,%al
 15556   10b70d: 05 00 00 15 05        add    $0x5150000,%eax
 15557   10b712: 28 00                 sub    %al,(%rax)
 15558   10b714: 3d 05 00 00 ff        cmp    $0xff000005,%eax
 15559   10b719: 03 29                 add    (%rcx),%ebp
 15560   10b71b: 01 19                 add    %ebx,(%rcx)
 15561   10b71d: 3f                    (bad)  
 15562   10b71e: 0a 4e 03              or     0x3(%rsi),%cl
 15563   10b721: 5d                    pop    %rbp
 15564   10b722: 05 00 00 92 01        add    $0x1920000,%eax
 15565   10b727: 05 c6 01 00 ba        add    $0xba0001c6,%eax
 15566   10b72c: 01 05 d9 01 00 d4     add    %eax,-0x2bfffe27(%rip)
 15567   10b732: 01 18                 add    %ebx,(%rax)
 15568   10b734: 00 00                 add    %al,(%rax)
 15569   10b736: 02 00                 add    (%rax),%al
 15570   10b738: 01 7d 00              add    %edi,0x0(%rbp)
 15571   10b73b: 00 48 84              add    %cl,-0x7c(%rax)
 15572   10b73e: 10 00                 adc    %al,(%rax)
 15573   10b740: 78 8d                 js     10b6cf 
 15574   10b742: 10 00                 adc    %al,(%rax)
 15575   10b744: ff 03                 incl   (%rbx)
 15576   10b746: 1d 01 12 de 01        sbb    $0x1de1201,%eax
 15577   10b74b: c9                    leaveq 
 15578   10b74c: 06                    (bad)  
 15579   10b74d: 00 00                 add    %al,(%rax)
 15580   10b74f: d5                    (bad)  
 15581   10b750: 0a 05 b4 0c 01 91     or     -0x6efef34c(%rip),%al  
 15582   10b756: 0b 9c 01 00 00 01 00  or     0x10000(%rcx,%rax,1),%ebx
 15583   10b75d: 00 00                 add    %al,(%rax)
 15584   10b75f: 00 00                 add    %al,(%rax)
 15585   10b761: 00 00                 add    %al,(%rax)
 15586   10b763: 00 ff                 add    %bh,%bh
 15587   10b765: ff 01                 incl   (%rcx)
 15588   10b767: 00 ff                 add    %bh,%bh
 15589   10b769: 03 1d 01 12 82 01     add    0x1821201(%rip),%ebx
 15590   10b76f: 05 87 01 01 cb        add    $0xcb010187,%eax
 15591   10b774: 01 86 01 dd 02 00     add    %eax,0x2dd01(%rsi)
 15592   10b77a: f7 02 05 00 00 01     testl  $0x1000005,(%rdx)
 15593  ...
 15594   10b788: ff 03                 incl   (%rbx)
 15595   10b78a: 0d 01 04 10 02        or     $0x2100401,%eax
 15596   10b78f: 17                    (bad)  
 15597   10b790: 01 01                 add    %eax,(%rcx)
 15598   10b792: 00 00                 add    %al,(%rax)
 15599   10b794: 00 00                 add    %al,(%rax)
 15600  ...
 15601 

再來談談怎麼找到 landing_pad 的, 需要透過 list 2 L15550 的 .gcc_except_table section 裡頭的資料, 它有個專業術語, 叫做 language specific data area (lsda), 不過這個資料並無法直接從這些 16 進制的數字解讀, 需要在 runtime 時, 透過一個小型直譯器解譯出這些資料, 複雜之餘又添複雜。

類似 list 3 這些程式, 實際上遠遠比 list 3 列出的還複雜。

list 3 解讀 .gcc_except_table section 程式碼
p = read_encoded_value (0, info.call_site_encoding, p, &cs_start);
p = read_encoded_value (0, info.call_site_encoding, p, &cs_len);
p = read_encoded_value (0, info.call_site_encoding, p, &cs_lp);
p = read_uleb128 (p, &cs_action);

.gcc_except_table section 的資料格式請參考: c++ 異常處理 (2), exception handling tables, 我一樣沒搞懂, 大致有 3 個表格。

  1. call site table: 每一筆 call site record 有 4 個資訊, 就是 list 3 那 4 個, 但我不清楚其中關係, 怎麼透過這些資訊定位出 landing_pad。

    參考「c++ 異常處理 (2)」這篇,
    LSDA 表头之后紧跟着的是 call site table,该表用于记录程序中哪些指令有可能会抛异常,表中每条记录共有4个字段:
    1)cs_start: 可能会抛异常的指令的地址,该地址是距 Landing pad 起始地址的偏移,编码方式由 LSDA 表头中第一个字段指明。
    2)cs_len: 可能抛异常的指令的区域长度,该字段与 1)一起表示一系列连续的指令,编码方式与 1)相同。
    3)cs_lp: 用于处理上述指令的 Landing pad 的位移,这个值如果为 0 则表示不存在相应的 landing pad。
    4)cs_action: 指明要采取哪些 action,这是一个 unsigned LEB128 的值,该值减1后作为下标获取 action table 中相应记录。

    .gcc_except_table」也有類似的敘述
    1. The start of the instructions for the current call site, a byte offset from the landing pad base. This is encoded using the encoding from the header.
    2. The length of the instructions for the current call site, in bytes. This is encoded using the encoding from the header.
    3. A pointer to the landing pad for this sequence of instructions, or 0 if there isn’t one. This is a byte offset from the landing pad base. This is encoded using the encoding from the header.
    4. The action to take, an unsigned LEB128. This is 1 plus a byte offset into the action table. The value zero means that there is no action.

    應該還是很模糊吧! 我依然有看沒有懂。cs_lp 可以和 info.LPStart 加總得到 ladning_pad, cs_action 可以和 info.action_table 計算得到 action_record 的位址。cs_start, cs_len 不懂其用意。

    程式碼: list 5. L294
  2. action table: 裡頭的資訊可以用來取得 catch 的所有 type, 以 eh.cpp 來說, 有 2 個 catch statement, catch (std::exception &e), catch (int a), 就可以透過 action table 來取得 std::exception, int 的 type_info。

    在和 throw 的物件做比對 (這邊的例子是丟出整數 100), 便可以知道這個 landing_pad 是不是 catch handle, 如果沒有吻合, 這個 landing_pad 有可能只是要呼叫某個物件的解構函式, 用來清除該物件。
  3. type table: 紀錄著所有 catch 的 type。
list 5. L365 在取得 catch statement 的 type_info, 很複雜, 會透過
  1. info->ttype_encoding
  2. info->ttype_base
  3. info->TType
以及 action_record 的 ar_filter
353 p = action_record;
354 p = read_sleb128 (p, &ar_filter);

p = read_encoded_value (0, info.call_site_encoding, p, &cs_lp);
p = read_uleb128 (p, &cs_action);
p = read_sleb128 (p, &ar_filter);

這些函式都是用來讀取 .gcc_except_table section 的內容, 由於這些值有經過壓縮, 所以得做個還原的動作。

list 5. libstdc++-v3/libsupc++/eh_personality.cc
  1 // -*- C++ -*- The GNU C++ exception personality routine.
  2 // Copyright (C) 2001-2018 Free Software Foundation, Inc.
  3 //
  4 // This file is part of GCC.
 83 // Return an element from a type table.
 84 
 85 static const std::type_info *
 86 get_ttype_entry (lsda_header_info *info, _uleb128_t i)
 87 {
 88   _Unwind_Ptr ptr;
 89 
 90   i *= size_of_encoded_value (info->ttype_encoding);
 91   read_encoded_value_with_base (info->ttype_encoding, info->ttype_base, info->TType - i, &ptr);
 93 
 94   return reinterpret_cast<const std::type_info *>(ptr);
 95 }
 96 
213 namespace __cxxabiv1
214 {
215 
216 extern "C"
217 _Unwind_Reason_Code
218 __gxx_personality_v0 (int version,
219         _Unwind_Action actions,
220         _Unwind_Exception_Class exception_class,
221         struct _Unwind_Exception *ue_header,
222         struct _Unwind_Context *context)
223 {
224   enum found_handler_type
225   {
226     found_nothing,
227     found_terminate,
228     found_cleanup,
229     found_handler
230   } found_type;
231 
232   lsda_header_info info;
233   const unsigned char *language_specific_data;
234   const unsigned char *action_record;
235   const unsigned char *p;
236   _Unwind_Ptr landing_pad, ip;
237   int handler_switch_value;
238   void* thrown_ptr = 0;
239   bool foreign_exception;
240   int ip_before_insn = 0;
241 
242   __cxa_exception* xh = __get_exception_header_from_ue(ue_header);
243 
244   // Interface version check.
245   if (version != 1)
246     return _URC_FATAL_PHASE1_ERROR;
247   foreign_exception = !__is_gxx_exception_class(exception_class);
248 
249   // Shortcut for phase 2 found handler for domestic exception.
250   if (actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME)
251       && !foreign_exception)
252     {
253       restore_caught_exception(ue_header, handler_switch_value,
254           language_specific_data, landing_pad);
255       found_type = (landing_pad == 0 ? found_terminate : found_handler);
256       goto install_context;
257     }
258 
259   language_specific_data = (const unsigned char *)
260     _Unwind_GetLanguageSpecificData (context);
261 
262   // If no LSDA, then there are no handlers or cleanups.
263   if (! language_specific_data)
264     CONTINUE_UNWINDING;
265 
266   // Parse the LSDA header.
267   p = parse_lsda_header (context, language_specific_data, &info);
268   info.ttype_base = base_of_encoded_value (info.ttype_encoding, context);
269   ip = _Unwind_GetIPInfo (context, &ip_before_insn);
270   if (! ip_before_insn)
271     --ip;
272   landing_pad = 0;
273   action_record = 0;
274   handler_switch_value = 0;
275 
276   // Search the call-site table for the action associated with this IP.
277   while (p < info.action_table)
278     {
279       _Unwind_Ptr cs_start, cs_len, cs_lp;
280       _uleb128_t cs_action;
281 
282       // Note that all call-site encodings are "absolute" displacements.
283       p = read_encoded_value (0, info.call_site_encoding, p, &cs_start);
284       p = read_encoded_value (0, info.call_site_encoding, p, &cs_len);
285       p = read_encoded_value (0, info.call_site_encoding, p, &cs_lp);
286       p = read_uleb128 (p, &cs_action);
287 
288       // The table is sorted, so if we've passed the ip, stop.
289       if (ip < info.Start + cs_start)
290  p = info.action_table;
291       else if (ip < info.Start + cs_start + cs_len)
292  {
293    if (cs_lp)
294      landing_pad = info.LPStart + cs_lp;
295    if (cs_action)
296      action_record = info.action_table + cs_action - 1;
297    goto found_something;
298  }
299     }
300 
301   // If ip is not present in the table, call terminate.  This is for
302   // a destructor inside a cleanup, or a library routine the compiler
303   // was not expecting to throw.
304   found_type = found_terminate;
305   goto do_something;
306 
307  found_something:
308   if (landing_pad == 0)
309     {
310       // If ip is present, and has a null landing pad, there are
311       // no cleanups or handlers to be run.
312       found_type = found_nothing;
313     }
314   else if (action_record == 0)
315     {
316       // If ip is present, has a non-null landing pad, and a null
317       // action table offset, then there are only cleanups present.
318       // Cleanups use a zero switch value, as set above.
319       found_type = found_cleanup;
320     }
321   else
322     {
323       // Otherwise we have a catch handler or exception specification.
324 
325       _sleb128_t ar_filter, ar_disp;
326       const std::type_info* catch_type;
327       _throw_typet* throw_type;
328       bool saw_cleanup = false;
329       bool saw_handler = false;
330 
331 #if __cpp_rtti
332       // During forced unwinding, match a magic exception type.
333       if (actions & _UA_FORCE_UNWIND)
334  {
335    throw_type = &typeid(abi::__forced_unwind);
336  }
337       // With a foreign exception class, there's no exception type.
338       // ??? What to do about GNU Java and GNU Ada exceptions?
339       else if (foreign_exception)
340  {
341    throw_type = &typeid(abi::__foreign_exception);
342  }
343       else
344 #endif
345         {
346           thrown_ptr = __get_object_from_ue (ue_header);
347           throw_type = __get_exception_header_from_obj
348             (thrown_ptr)->exceptionType;
349         }
350 
351       while (1)
352  {
353    p = action_record;
354    p = read_sleb128 (p, &ar_filter);
355    read_sleb128 (p, &ar_disp);
356 
357    if (ar_filter == 0)
358      {
359        // Zero filter values are cleanups.
360        saw_cleanup = true;
361      }
362    else if (ar_filter > 0)
363      {
364        // Positive filter values are handlers.
365        catch_type = get_ttype_entry (&info, ar_filter);
366 
367        // Null catch type is a catch-all handler; we can catch foreign
368        // exceptions with this.  Otherwise we must match types.
369        if (! catch_type
370     || (throw_type
371         && get_adjusted_ptr (catch_type, throw_type,
372         &thrown_ptr)))
373   {
374     saw_handler = true;
375     break;
376   }
377      }
378    else
379      {
380        // Negative filter values are exception specifications.
381        // ??? How do foreign exceptions fit in?  As far as I can
382        // see we can't match because there's no __cxa_exception
383        // object to stuff bits in for __cxa_call_unexpected to use.
384        // Allow them iff the exception spec is non-empty.  I.e.
385        // a throw() specification results in __unexpected.
386        if ((throw_type
387      && !(actions & _UA_FORCE_UNWIND)
388      && !foreign_exception)
389     ? ! check_exception_spec (&info, throw_type, thrown_ptr,
390          ar_filter)
391     : empty_exception_spec (&info, ar_filter))
392   {
393     saw_handler = true;
394     break;
395   }
396      }
397 
398    if (ar_disp == 0)
399      break;
400    action_record = p + ar_disp;
401  }
402 
403       if (saw_handler)
404  {
405    handler_switch_value = ar_filter;
406    found_type = found_handler;
407  }
408       else
409  found_type = (saw_cleanup ? found_cleanup : found_nothing);
410     }
411 
412  do_something:
413    if (found_type == found_nothing)
414      CONTINUE_UNWINDING;
415 
416   if (actions & _UA_SEARCH_PHASE)
417     {
418       if (found_type == found_cleanup)
419  CONTINUE_UNWINDING;
420 
421       // For domestic exceptions, we cache data from phase 1 for phase 2.
422       if (!foreign_exception)
423         {
424    save_caught_exception(ue_header, context, thrown_ptr,
425     handler_switch_value, language_specific_data,
426     landing_pad, action_record);
427  }
428       return _URC_HANDLER_FOUND;
429     }
430 
431  install_context:
432   
433   // We can't use any of the cxa routines with foreign exceptions,
434   // because they all expect ue_header to be a struct __cxa_exception.
435   // So in that case, call terminate or unexpected directly.
436   if ((actions & _UA_FORCE_UNWIND)
437       || foreign_exception)
438     {
439       if (found_type == found_terminate)
440  std::terminate ();
441       else if (handler_switch_value < 0)
442  {
443    __try 
444      { std::unexpected (); } 
445    __catch(...) 
446      { std::terminate (); }
447  }
448     }
449   else
450     {
451       if (found_type == found_terminate)
452  __cxa_call_terminate(ue_header);
453 
454       // Cache the TType base value for __cxa_call_unexpected, as we won't
455       // have an _Unwind_Context then.
456       if (handler_switch_value < 0)
457  {
458    parse_lsda_header (context, language_specific_data, &info);
459    info.ttype_base = base_of_encoded_value (info.ttype_encoding,
460          context);
461 
462    xh->catchTemp = base_of_encoded_value (info.ttype_encoding, context);
463  }
464     }
465 
466   /* For targets with pointers smaller than the word size, we must extend the
467      pointer, and this extension is target dependent.  */
468   _Unwind_SetGR (context, __builtin_eh_return_data_regno (0),
469    __builtin_extend_pointer (ue_header));
470   _Unwind_SetGR (context, __builtin_eh_return_data_regno (1),
471    handler_switch_value);
472   _Unwind_SetIP (context, landing_pad);
473   return _URC_INSTALL_CONTEXT;
474 }
475 

在寫下這篇之後, 突然好像覺得沒有那麼難了, 由於找不到正式的 gcc_except_table 文件, 所以只能從程式碼推敲這些 table 欄位的用意, 實在是難於登天, 我使用 gdb 追蹤了 20 多次, 依然沒有太大的概念。

再來看看 type table 藏在哪裡? list 6 是反組譯 .gcc_except_table 的部份內容。

list 6. .gcc_except_table 的 type info table
10c980:       68 94 10 00 
10c984:       98 9d 10 00           

get_ttype_entry 函式在找出 catch_type, 以 eh.cpp 來說, 有 catch (std::exception &e), catch (int a), 所以應該會有 2 筆紀錄, 就是 list 6 那 2 筆。

catch_type = get_ttype_entry (&info, ar_filter);

透過計算, 會得到一個 p 指標 - 0x000000000010c984, 再根據 list 7, 取出 unaligned 的 u4 欄位, 就是 type table 的某個 type。

const union unaligned *u = (const union unaligned *) p;
result = u->u4;

這個計算有好幾種方式, 這只是我追蹤的其中一種。

(gdb) x/32xb 0x10c984
0x10c984: 0x98 0x9d 0x10 0x00 0xff 0x03 0x19 0x01
0x10c98c: 0x11 0x78 0xf6 0x0a 0x00 0x00 0x9c 0x0e
0x10c994: 0x05 0xfb 0x0f 0x01 0xd8 0x0e 0x9c 0x01
0x10c99c: 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00

這個 0x98 0x9d 0x10 0x00 -> 0x109d98, 就是
0000000000109d98 <typeinfo for std::exception>

另外一個 0x109468 就是
0000000000109468 <ypeinfo for int>

分別對應到 std::exception, int 這 2 個 type_info。

list 7. libgcc/unwind-pe.h
 1 const unsigned char * read_encoded_value_with_base (unsigned char encoding, _Unwind_Ptr base, const unsigned char *p, _Unwind_Ptr *val)
 2 {
 3   union unaligned
 4     {
 5       void *ptr;
 6       unsigned u2 __attribute__ ((mode (HI)));
 7       unsigned u4 __attribute__ ((mode (SI)));
 8       unsigned u8 __attribute__ ((mode (DI)));
 9       signed s2 __attribute__ ((mode (HI)));
10       signed s4 __attribute__ ((mode (SI)));
11       signed s8 __attribute__ ((mode (DI)));
12     } __attribute__((__packed__));

相關的數值
ar_filter: 1
透過 ar_filter 取得的值: 4
info->ttype_encoding: 3
info->ttype_base: 0
info->TType: 10c988

p 就是從 info->TType - 透過 ar_filter 取得的值 = 10c988 - 4 = 10c984
catch_type: 0x0000000000109d98

eh1.cpp 有個 two()。

eh1.cpp
 1 #include <exception>
 2 
 3 class Exception: public std::exception {
 4 public:
 5   Exception() {
 6     printf("Construct test exception\n");
 7   }
 8   ~Exception() {
 9     printf("Destruct test exception\n");
10   }
11 
12   virtual const char *what() const noexcept override 
13   {
14     return "Test eh";
15   }
16 };
17 
18 int two()
19 {
20   Exception ex;
21   one();
22 }
23 
24 int one() 
25 {
26   throw 100;
27 }
28 
29 void main()
30 {
31   try
32   {
33     two();
34   }
35   catch (std::exception &e)
36   {
37     printf("get excetption: %s\n", e.what());
38   }
39   catch (int a)
40   {
41     printf("got excetption: %d\n", a);
42   }
43 }

eh1.cpp 的 exception handle 流程又會是怎麼樣的呢? eh1 會有 2 個 landing_pad, 分別是 ...

阿 ... 這篇已經太長了, 下回再說 (如果有的話) ...

ref:
Understanding the .gcc_except_table section in ELF binaries (GCC)

2019年11月20日 星期三

c++ runtime - rtti

幽窗燈一點,樂處超五欲。
environment:
x86_64
g++ 8.2.0

在研究過 exception handle 之後, 輪到 rtti 了, 我想「知道」rtti 是怎麼辦到的? c++ 裡頭實在有太多神秘的機制, 用來支援這些特性。

rtti 有 2 個東西:
  1. dynamic_cast 運算子
  2. typeid 運算子
我有興趣的是 typeid, 一樣是 c++ 編譯器做的手法, 而 c++ 的 typeid 並無法取得一個 type_info 的 class, 只能透過 typeid 來呼叫 member fuction name(), 無法這樣寫, 但是 ...

ex.cpp
type_info ti(??);
ti->name();

所有 c++ 書籍都會提到, 在程式中取得 type_info object 的唯一方式, 就是使用 typeid, 但如果硬是要寫, '??' 應該要傳入什麼呢? 我不知道, 不過也不重要, 不用去深究它。

事實上在我把所有的 type_info ctor 改成 public, 也無法正確編譯 ex.cpp, 原因就不管了。反正這個 type_info object 是由 c++ 編譯器幫我們造出來的, 她在哪裡呢?

為了分析 type_info, 我透過 gdb 反組譯, 編譯之後的 .s, objdump 的反組譯來查看 type_info 相關程式碼, 終於有了一點方向。

以 rtti.cpp 這個範例來說明, list 1. L13465, typeinfo for int, 就是用來表達 int 的 type_info object, c++ 編譯器為我們產生了這個物件, 不是執行時期才建構出來, 而是在編譯時期就產生在執行檔裡頭了。typeinfo for int 的具體型別在 g++ 的實作是: __fundamental_type_info (list 3. L96)。

可以看到 __fundamental_type_info 繼承了 std::type_info。

__cxxabiv1::__fundamental_type_info fti{"123i"};

如果這麼寫的話 (當然得先 include cxxabi.h), 就可以得到一個 __fundamental_type_info (所以我才會說不用管無法直接取得 type_info 的問題, 當然喜歡追根就底的朋友還是可以去挖 code 查找這個祕密, 並不難理解), 也就是 type_info 的衍生物, 呼叫其中的 name(), 就會印出 123i。

rtti.cpp
102 void main()
103 {
107     printf("ni's name: %s\n", typeid(86).name());
108 }

從 list 1 L293, 294 可以看到這就是 typeid(86).name(), 轉成 c 函式就是

name(type_info *this);

這個 this 就是一個 type_info, 她在 list 1. L13465, 位址是 0x10a818, 所以 L293 才會把這個位址放入 edi。

rtti.s L155 的 $_ZTIi, 就是這個 type_info object。

list 6 透過 gdb 反組譯來觀察 0x10a818, 可以用 p *(std::type_info*)0x10a818 來轉出 type_info,

如果是自定義的 class, 就會出動 __cxxabiv1::__class_type_info 來儲存「自定義 class」的 type。

__cxxabiv1::__class_type_info
1 (gdb) p *(std::type_info *)0x109ec8
2 $2 = {_vptr.type_info = 0x109f68 <vtable for __cxxabiv1::__class_type_info+16>,
3   __name = 0x109ee0 <typeinfo name for TestGlobalCtorDtor> "18TestGlobalCtorDtor"}
4 (gdb) p *(__cxxabiv1::__class_type_info *)0x109ec8
5 $3 = {<std::type_info> = {_vptr.type_info = 0x109f68 <vtable for __cxxabiv1::__class_type_info+16>,
6     __name = 0x109ee0 <typeinfo name for TestGlobalCtorDtor> "18TestGlobalCtorDtor"}, <No data fields>}

18 很巧的就是 TestGlobalCtorDtor 的長度, 這是自己定義的一個 class name。

list 1 x86_64-elf-objdump -CD rtti
   288 00000000001003ad <main>:
   289   1003ad: 55                      push   %rbp
   290   1003ae: 48 89 e5                mov    %rsp,%rbp
   291   1003b1: 48 83 ec 10             sub    $0x10,%rsp
   292   1003b5: 48 89 7d f8             mov    %rdi,-0x8(%rbp)
   293   1003b9: bf 18 a8 10 00          mov    $0x10a818,%edi
   294   1003be: e8 69 00 00 00          callq  10042c <std::type_info::name() const>
   295   1003c3: 48 89 c6                mov    %rax,%rsi
   296   1003c6: bf 84 a4 10 00          mov    $0x10a484,%edi
   297   1003cb: b8 00 00 00 00          mov    $0x0,%eax
   298   1003d0: e8 f1 04 00 00          callq  1008c6 <io::printf(char const*, ...)>
   299   1003d5: 90                      nop
   300   1003d6: c9                      leaveq
   301   1003d7: c3                      retq

 13465 000000000010a818 <typeinfo for int>:
 13466   10a818: 28 ad 10 00 00 00       sub    %ch,0x10(%rbp)
 13467   10a81e: 00 00                   add    %al,(%rax)
 13468   10a820: f2 ad                   repnz lods %ds:(%rsi),%eax
 13469   10a822: 10 00                   adc    %al,(%rax)
 13470   10a824: 00 00                   add    %al,(%rax)

base class type_info 定義在 typeinfo, cxxabi.h 定義了繼承 type_info 的衍生類別, 還蠻多的, 這邊只列出一些。

list 2 libsupc++/typeinfo
  1 // RTTI support for -*- C++ -*-
 25 /** @file typeinfo
 26  *  This is a Standard C++ Library header.
 27  */
 28 
 29 #ifndef _TYPEINFO
 30 #define _TYPEINFO
 31 
 32 #pragma GCC system_header
 33 
 34 #include <bits/exception.h>
 35 #include <bits/hash_bytes.h>
 36 
 37 #pragma GCC visibility push(default)
 38 
 39 extern "C++" {
 40 
 41 namespace __cxxabiv1
 42 {
 43   class __class_type_info;
 44 } // namespace __cxxabiv1
 45 
 70 
 71 namespace std
 72 {
 73   /**
 74    *  @brief  Part of RTTI.
 75    *
 76    *  The @c type_info class describes type information generated by
 77    *  an implementation.
 78   */
 79   class type_info
 80   {
 81   public:
 82     /** Destructor first. Being the first non-inline virtual function, this
 83      *  controls in which translation unit the vtable is emitted. The
 84      *  compiler makes use of that information to know where to emit
 85      *  the runtime-mandated type_info structures in the new-abi.  */
 86     virtual ~type_info();
 87 
 88     /** Returns an @e implementation-defined byte string; this is not
 89      *  portable between compilers!  */
 90     const char* name() const noexcept
 91     { return __name[0] == '*' ? __name + 1 : __name; }
 92 
 93     // On some targets we can rely on type_info's NTBS being unique,
 94     // and therefore address comparisons are sufficient.
 95     bool before(const type_info& __arg) const noexcept
 96     { return __name < __arg.__name; }
 97 
 98     bool operator==(const type_info& __arg) const noexcept
 99     { return __name == __arg.__name; }
100 
101     bool operator!=(const type_info& __arg) const noexcept
102     { return !operator==(__arg); }
103 
104     size_t hash_code() const noexcept
105     {
106       return reinterpret_cast<size_t>(__name);
107     }
108 
109     // Return true if this is a pointer type of some kind
110     virtual bool __is_pointer_p() const;
111 
112     // Return true if this is a function type
113     virtual bool __is_function_p() const;
114 
115     // Try and catch a thrown type. Store an adjusted pointer to the
116     // caught type in THR_OBJ. If THR_TYPE is not a pointer type, then
117     // THR_OBJ points to the thrown object. If THR_TYPE is a pointer
118     // type, then THR_OBJ is the pointer itself. OUTER indicates the
119     // number of outer pointers, and whether they were const
120     // qualified.
121     virtual bool __do_catch(const type_info *__thr_type, void **__thr_obj,
122        unsigned __outer) const;
123 
124     // Internally used during catch matching
125     virtual bool __do_upcast(const __cxxabiv1::__class_type_info *__target,
126         void **__obj_ptr) const;
127 
128   protected:
129     const char *__name;
130 
131     explicit type_info(const char *__n): __name(__n) { }
132 
133   private:
134     /// Assigning type_info is not supported.
135     type_info& operator=(const type_info&);
136     type_info(const type_info&);
137   };
138 
139   /**
140    *  @brief  Thrown during incorrect typecasting.
141    *  @ingroup exceptions
142    *
143    *  If you attempt an invalid @c dynamic_cast expression, an instance of
144    *  this class (or something derived from this class) is thrown.  */
145   class bad_cast : public exception
146   {
147   public:
148     bad_cast() noexcept { }
149 
150     // This declaration is not useless:
151     // http://gcc.gnu.org/onlinedocs/gcc-3.0.2/gcc_6.html#SEC118
152     virtual ~bad_cast() noexcept;
153 
154     // See comment in eh_exception.cc.
155     virtual const char* what() const noexcept;
156   };
157 
158   /**
159    *  @brief Thrown when a NULL pointer in a @c typeid expression is used.
160    *  @ingroup exceptions
161    */
162   class bad_typeid : public exception
163   {
164   public:
165     bad_typeid () noexcept { }
166 
167     // This declaration is not useless:
168     // http://gcc.gnu.org/onlinedocs/gcc-3.0.2/gcc_6.html#SEC118
169     virtual ~bad_typeid() noexcept;
170 
171     // See comment in eh_exception.cc.
172     virtual const char* what() const noexcept;
173   };
174 } // namespace std
175 
176 } // extern "C++"
177 
178 #pragma GCC visibility pop
179 
180 #endif

list 3 libsupc++/cxxabi.h
 91 #include <typeinfo>
 92 
 93 namespace __cxxabiv1
 94 {
 95   // Type information for int, float etc.
 96   class __fundamental_type_info : public std::type_info
 97   {
 98   public:
 99     explicit
100     __fundamental_type_info(const char* __n) : std::type_info(__n) { }
101 
102     virtual
103     ~__fundamental_type_info();
104   };
105 
106   // Type information for array objects.
107   class __array_type_info : public std::type_info
108   {
109   public:
110     explicit
111     __array_type_info(const char* __n) : std::type_info(__n) { }
112 
113     virtual
114     ~__array_type_info();
115   };



rtti.s
 139     .string "ni's name: %s\n"
 140     .text
 141     .globl  main
 142     .type   main, @function
 143 main:
 144 .LFB60:
 145     .loc 2 104 1
 146     .cfi_startproc
 147     pushq   %rbp
 148     .cfi_def_cfa_offset 16
 149     .cfi_offset 6, -16
 150     movq    %rsp, %rbp
 151     .cfi_def_cfa_register 6
 152     subq    $16, %rsp
 153     movq    %rdi, -8(%rbp)
 154     .loc 2 107 15
 155     movl    $_ZTIi, %edi
 156     call    _ZNKSt9type_info4nameEv
 157     movq    %rax, %rsi
 158     movl    $.LC2, %edi
 159     movl    $0, %eax
 160     call    _ZN2io6printfEPKcz

list 6 rtti.gdb
1107     printf("ni's name: %s\n", typeid(86).name());
 2 (gdb) p *(std::type_info*)0x10a818
 3 $2 = {_vptr.type_info = 0x10ad28 <vtable for __cxxabiv1::__fundamental_type_info+16>, 
 4   __name = 0x10adf2 <typeinfo name for int> "i"}
 5 (gdb) x/x32b 0x10a818
 6 Invalid number "32b".
 7 (gdb) x/32xb 0x10a818
 8 0x10a818 <_ZTIi>: 0x28 0xad 0x10 0x00 0x00 0x00 0x00 0x00
 9 0x10a820 <_ZTIi+8>: 0xf2 0xad 0x10 0x00 0x00 0x00 0x00 0x00
10 0x10a828 <_ZTIPi>: 0xc0 0xae 0x10 0x00 0x00 0x00 0x00 0x00
11 0x10a830 <_ZTIPi+8>: 0xef 0xad 0x10 0x00 0x00 0x00 0x00 0x00

2018年6月22日 星期五

c++ member function pointer 的實作 by cfront

Re: [問題] 關於Class指標的觀念

我在這篇回應過一些想法, 不過基本上是錯誤的。

Why am I having trouble taking the address of a C++ function?

Short answer: if you’re trying to store it into (or pass it as) a pointer-to-function, then that’s the problem — this is a corollary to the previous FAQ.
Long answer: In C++, member functions have an implicit parameter which points to the object (the this pointer inside the member function). Normal C functions can be thought of as having a different calling convention from member functions, so the types of their pointers (pointer-to-member-function vs pointer-to-function) are different and incompatible. C++ introduces a new type of pointer, called a pointer-to-member, which can be invoked only by providing an object.
NOTE: do not attempt to “cast” a pointer-to-member-function into a pointer-to-function; the result is undefined and probably disastrous. E.g., a pointer-to-member-function is not required to contain the machine address of the appropriate function. As was said in the last example, if you have a pointer to a regular C function, use either a top-level (non-member) function, or a static (class) member function.
上述藍色文字在說, 將 member function pointer 轉成一般 pointer 是不可行的, 雖然我做了以上的測試, 將 member function pointer 轉成 non-member function pointer 之後, 再去執行, 看起來沒什麼問題, 但直到我使用 cfront 查看 member function pointer 實作之後, 我才真正理解為什麼這樣看起來可以運作的程式碼, 其實是錯誤的, 請不要這麼做

h1.C
 1 #define CFRONT_CPP
 2 
 3 #ifdef CFRONT_CPP
 4 #include <stream.h>
 5 #include <stdint.h>
 6 #else
 7 #include <iostream>
 8 using namespace std;
 9 #endif
10 
11 class A
12 {
13   public:
14     virtual void foo(int a = 0)
15     {
16       printf("A %d\n", a);
17     }
18     virtual void va(int a)
19     {
20       printf("va: A %d\n", a);
21     }
22     void mf1()
23     {
24       printf("mf: mf1\n");
25     }
26 };
27 
28 class B : public A
29 {
30   public:
31     virtual void foo(int a = 1)
32     {
33       printf("B a: %d\n", a);
34     }
35 };
36 
37 int main(int argc, char *argv[])
38 {
39   A a;
40   void (A::*mf)() = &A::mf1;
41 42 uintptr_t addr = *((uintptr_t*)&mf); 43 (*(void(*)(A *))(addr) )(&a); 44 (a.*mf)(); 45 46 #if 0 47 printf("sizeof(mf): %u\n", sizeof(mf)); 48 cout << "(sizeof(mf): " << sizeof(mf) << endl; 49 #endif 50 return 0; 51 }

h1.C L40 被轉成 h1..c L818

list 1
40 void (A::*mf)() = &A::mf1;
->
016 typedef int (*__vptp)(void);
017 struct __mptr {short d; short i; __vptp f; };
811 struct __mptr __1mf ;
818 ((__1mf .d=0),((__1mf .i=-1),(__1mf .f=(((int (*)(void ))mf1__1AFv)))));

一個 member function pointer 事實上並不是指標, 而是一個結構, 參考 list 1. L017 struct __mptr, 其中有 d, i, f, 3 個欄位, 而 f 才是用來指向 member function。mf1__1AFv 就是

void A::mf1()

當使用一個 member function pointer 變數時, 其實操作的不僅僅是指標 (是 struct __mptr), 裡頭還有額外的 d, i 2 個欄位, 所以用
42   uintptr_t addr = *((uintptr_t*)&mf);
43   (*(void(*)(A *))(addr) )(&a);
將一個 member function pointer 轉成一個指向 non-member function pointer, 其實並不是指標互轉, 而是把一個 struct, 裡頭有 d, i, f, 轉成一個指標, 這樣當然是不可能會正確的。

上述的語法先把 member function pointer 轉成一個整數, 再轉成 non-member function pointer 去執行。

H1.C L44 (a.*mf)(); 被轉成以下 2 行, 類似上方程式碼的轉型動作。

821 __1addr = ((*(((uintptr_t *)(& __1mf )))));
822 ((*(((void (*)(struct A *))__1addr ))))( & __1a ) ;__1mf )))));

這僅僅是 cfront 的實作方式, 不同編譯器可能會有不同的實作方式。

h1..c
001 #line 1 "h1.C"
002 
003 /* <<AT&T C++ Language System <3.0.3> 05/05/94>> */
004 char __cfront_version_303_xxxxxxxx;
005 /* < h1.C > */

016 typedef int (*__vptp)(void);
017 struct __mptr {short d; short i; __vptp f; };

805 #line 37 "h1.C"
806 int main (int __1argc , char **__1argv ){ _main(); 
807 #line 38 "h1.C"
808 { 
809 #line 39 "h1.C"
810 struct A __1a ;
811 struct __mptr __1mf ;
812 
813 #line 42 "h1.C"
814 uintptr_t __1addr ;
815 
816 #line 39 "h1.C"
817 ( ((& __1a )-> __vptr__1A = (struct __mptr *) __ptbl_vec__h1_C_[0]), (& __1a )) ;
818 ( (__1mf .d= 0 ), ( (__1mf .i= -1), (__1mf .f= (((int (*)(void ))mf1__1AFv )))) ) ;
819 
820 #line 42 "h1.C"
821 __1addr = ((*(((uintptr_t *)(& __1mf )))));
822 ((*(((void (*)(struct A *))__1addr ))))( & __1a ) ;
823 (__1mf .i< 0 )?((*(((void (*)(struct A *__0this ))__1mf .f))))( ((struct A *)((((char *)(& __1a )))+ __1mf .d))) :((*(((void (*)(struct A *__0this ))((& __1a )-> __vptr__1A [__1mf .i]).f))))(
824 #line 44 "h1.C"
825 ((struct A *)((((char *)(& __1a )))+ ((& __1a )-> __vptr__1A [__1mf .i]).d))) ;
826 
827 #line 50 "h1.C"
828 return 0 ;
829 }
830 } 
947 #line 51 "h1.C"
948 
949 /* the end */

h1.C L47 印出這個 member function pointer 大小時, 其實是印出 struct __mptr {short d; short i; __vptp f; }; 的大小, 在我的平台上, 是 16 byte。

2018年4月12日 星期四

c++ virtual function 的實作 by cfront

c++ virtual function 被很多人拿來研究, 由於沒有 cfront, 大部分的人都是從反組譯來觀察, 太苦了。

目前我已經把 cfront 建構出來, 可以使用 cfront 來看看 virtual function 轉出的 c code 長什麼樣, 破解 c++ virtual function。

hello_a.C
 1 #include <stream.h>
 2 
 3 class A
 4 {
 5   public:
 6     virtual void foo(int a = 0)
 7     {
 8       printf("A %d\n", a);
 9     }
10     virtual void va(int a)
11     {
12       printf("va: A %d\n", a);
13     }
14 };
15 
16 class B : public A
17 {
18   public:
19     virtual void foo(int a = 1)
20     {
21       printf("B a: %d\n", a);
22     }
23 };
24 
25 
26 main()
27 {
28   A *p = new A();
29   p->foo();
30   p->va(25);
31 }

hello.a.c 是 cfont 轉出來的 c code, 31 行的 c++ 程式碼透過 cfront 轉出 773 行 c code, 先來看看 class A, class B 被轉成什麼? hello.a.c L633, L643 就是對應的 class A, class B, 被轉成 strcut:
633 struct A { /* sizeof A == 8 */
636   struct __mptr *__vptr__1A ;
637 };
639 
643 struct B { /* sizeof B == 8 */
646   struct __mptr *__vptr__1A ;
647 };
裡頭有一個 __vptr__1A:
646 struct __mptr *__vptr__1A ;
struct __mptr 定義在 L17。
17 struct __mptr {short d; short i; __vptp f; };
注意其中的 __vptp f 即可, 這個就是用來儲存 virtual function, 其實 virtual function 就是一般的 c function, 所以也只是把 virtual function 的位址存起來。

再來是 L766 的 __ptbl_vec__hello_C_, 這邊建構了__vtbl__1A__hello_C
766 struct __mptr* __ptbl_vec__hello_C_[] = {
767 __vtbl__1A__hello_C,
768 
769 };
__vtbl__1A__hello_C 在 L699 定義:
699 struct __mptr __vtbl__1A__hello_C[] = {0,0,0,
700 0,0,(__vptp)foo__1AFi ,
701 0,0,(__vptp)va__1AFi ,
702 0,0,0};
__vtbl__1A__hello_C[0]: 0,0,0
__vtbl__1A__hello_C[1]: 0,0, foo__1AFi 就是 class A 的 virtual void foo(int a = 0)
__vtbl__1A__hello_C[2]: 0,0, va__1AFi 就是 class A virtual void va(int a)
A *p = new A(); 會用 new (__nw__FUl 就是 new) 建構 struct A, 把 p->__vptr__1A = __ptbl_vec__hello_C_[0]
對應到 hello.a.c L668
668 __1p = ( (__0__X52 = 0 ), ( ((__0__X52 || (__0__X52 = (struct A *)__nw__FUl ( (unsigned long )(sizeof (struct A))) ))?(__0__X52 ->
669 #line 28 "hello.C"
670 __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[0]):0 ), __0__X52 ) ) ;
p->foo(); 就是執行 p->__vptr__1A[1].f

對應到 hello.a.c L671
671 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [1]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [1]).d)), 0 ) ;
p->va(25); 就是執行 p->__vptr__1A[2].f
對應到 hello.a.c L672
672 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [2]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [2]).d)), 25 ) ;
這些很恐怖的轉型/設定程式碼就是在做這些事情 (花點時間應該可以看懂), 也就是為什麼 class B 可以呼叫 class A 的 virtaul function va 的祕密。

hello.a.c
  1 #line 1 "hello.C"
  2 
  3 /* <<AT&T C++ Language System <3.0.3> 05/05/94>> */
  4 char __cfront_version_303_xxxxxxxx;
  5 /* < hello.C > */
  6 
  7 #pragma lib "ape/libap.a"
  8 
  9 #pragma lib "c++/libC.a"
 10 
 11 #line 1 "hello.C"
 12 void *__vec_new (void *, int , int , void *);
 13 
 14 #line 1 "hello.C"
 15 void __vec_delete (void *, int , int , void *, int , int );
 16 typedef int (*__vptp)(void);
 17 struct __mptr {short d; short i; __vptp f; };
 18 
 19 #line 1 "hello.C"
 20 extern struct __mptr* __ptbl_vec__hello_C_[];
 21 
632 #line 4 "hello.C"
633 struct A { /* sizeof A == 8 */
634 
635 #line 14 "hello.C"
636 struct __mptr *__vptr__1A ;
637 };
638 struct B;
639 
640 #line 14 "hello.C"
641 
642 #line 17 "hello.C"
643 struct B { /* sizeof B == 8 */
644 
645 #line 14 "hello.C"
646 struct __mptr *__vptr__1A ;
647 };
648 
649 #line 14 "hello.C"
650 
651 #line 6 "hello.C"
652 static void foo__1AFi (struct A *__0this , int __2a );
653 
654 #line 10 "hello.C"
655 static void va__1AFi (struct A *__0this , int __2a );
656 
657 #line 26 "hello.C"
658 int main (void ){ _main(); 
659 #line 27 "hello.C"
660 { 
661 #line 28 "hello.C"
662 struct A *__1p ;
663 
664 #line 29 "hello.C"
665 struct A *__0__X52 ;
666 
667 #line 28 "hello.C"
668 __1p = ( (__0__X52 = 0 ), ( ((__0__X52 || (__0__X52 = (struct A *)__nw__FUl ( (unsigned long )(sizeof (struct A))) ))?(__0__X52 ->
669 #line 28 "hello.C"
670 __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[0]):0 ), __0__X52 ) ) ;
671 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [1]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [1]).d)), 0 ) ;
672 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [2]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [2]).d)), 25 ) ;
673 }
674 } 
675 #line 31 "hello.C"
676 void __sti__hello_C_main_ (void )
677 #line 664 "incl-master/incl-linux32/iostream.h"
678 { __ct__13Iostream_initFv ( & iostream_init ) ;
679 
680 #line 664 "incl-master/incl-linux32/iostream.h"
681 }
682 
683 #line 31 "hello.C"
684 void __std__hello_C_main_ (void )
685 #line 664 "incl-master/incl-linux32/iostream.h"
686 { __dt__13Iostream_initFv ( & iostream_init , 2) ;
687 
688 #line 664 "incl-master/incl-linux32/iostream.h"
689 }
690 static void foo__1BFi (
691 #line 19 "hello.C"
692 struct B *__0this , 
693 #line 19 "hello.C"
694 int __2a );
695 struct __mptr __vtbl__1B__hello_C[] = {0,0,0,
696 0,0,(__vptp)foo__1BFi ,
697 0,0,(__vptp)va__1AFi ,
698 0,0,0};
699 struct __mptr __vtbl__1A__hello_C[] = {0,0,0,
700 0,0,(__vptp)foo__1AFi ,
701 0,0,(__vptp)va__1AFi ,
702 0,0,0};
703 static void foo__1BFi (struct B *__0this , 
704 #line 19 "hello.C"
705 int __2a )
706 #line 20 "hello.C"
707 { 
708 #line 21 "hello.C"
709 printf ( (const char *)"B a: %d\n",
710 #line 21 "hello.C"
711 __2a ) ;
712 }
713 
714 #line 10 "hello.C"
715 static void va__1AFi (struct A *__0this , 
716 #line 10 "hello.C"
717 int __2a )
718 #line 11 "hello.C"
719 { 
720 #line 12 "hello.C"
721 printf ( (const char *)"va: A %d\n",
722 #line 12 "hello.C"
723 __2a ) ;
724 }
725 
726 #line 6 "hello.C"
727 static void foo__1AFi (struct A *__0this , 
728 #line 6 "hello.C"
729 int __2a )
730 #line 7 "hello.C"
731 { 
732 #line 8 "hello.C"
733 printf ( (const char *)"A %d\n",
734 #line 8 "hello.C"
735 __2a ) ;
736 }
766 struct __mptr* __ptbl_vec__hello_C_[] = {
767 __vtbl__1A__hello_C,
768 
769 };
770 
771 #line 31 "hello.C"
772 
773 /* the end */

hello_b.C 和 hello_a.C 不同之處在於 new B(), 對於整個轉出來的程式當中, 只有一點點的不同:

hello_b.C
 1 #include <stream.h>
 2 
 3 class A
 4 {
 5   public:
 6     virtual void foo(int a = 0)
 7     {
 8       printf("A %d\n", a);
 9     }
10     virtual void va(int a)
11     {
12       printf("va: A %d\n", a);
13     }
14 };
15 
16 class B : public A
17 {
18   public:
19     virtual void foo(int a = 1)
20     {
21       printf("B a: %d\n", a);
22     }
23 };
24 
25 
26 main()
27 {
28   A *p = new B();
29   p->foo();
30   p->va(25);
31 }

hello.b.c 733

773 struct __mptr* __ptbl_vec__hello_C_[] = {
774 __vtbl__1A__hello_C,
775 __vtbl__1B__hello_C,
776 
777 };

多建構了一個 __vtbl__1B__hello_C
695 struct __mptr __vtbl__1B__hello_C[] = {0,0,0,
696 0,0,(__vptp)foo__1BFi ,
697 0,0,(__vptp)va__1AFi ,
698 0,0,0};

__vtbl__1B__hello_C[0]: 0,0,0
__vtbl__1B__hello_C[1]: 0,0, foo__1AFi 就是 class B 的 virtual void foo(int a = 1)
__vtbl__1B__hello_C[2]: 0,0, va__1AFi 就是 class A 的 virtual void va(int a)

A *p = new B(); 會用 new 建構 struct B, 把 p->__vptr__1A = __ptbl_vec__hello_C_[1]
對應到 hello.b.c L671

671 __1p = (struct A *)( (__0__X52 = 0 ), ( ((__0__X52 || (__0__X52 = (struct B *)__nw__FUl ( (unsigned long )(sizeof (struct B)))
672 #line 28 "hello.C"
673 ))?( (__0__X52 = (struct B *)( (__0__X51 = (((struct A *)__0__X52 ))), ( ((__0__X51 || (__0__X51 = (struct A *)__nw__FUl ( (unsigned long
674 #line 28 "hello.C"
675 )(sizeof (struct A))) ))?(__0__X51 -> __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[0]):0 ), __0__X51 ) ) ), (__0__X52 -> __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[1])) :0 ),
676 #line 28 "hello.C"
677 __0__X52 ) ) ;
p->foo(); 對應到 hello.b.c L678
678 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [1]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [1]).d)), 0 ) ;
p->va(25); 對應到 hello.b.c L679
679 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [2]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [2]).d)), 25 ) ;
這些很恐怖的轉型/設定程式碼就是在做這些事情, 而呼叫 virtual function 的程式碼是相同的, 只有在設定 __vptr__1A 有所不同而已。 看懂這些資料結構之後, virtaul function 就沒有那麼神秘了。 至於
17 struct __mptr {short d; short i; __vptp f; };
神秘的 d, i, 應該是用在繼承上的, 可能是其他的繼承方式, 有興趣的朋友可以繼續追蹤下去。
hello.b.c
632 #line 4 "hello.C"
633 struct A { /* sizeof A == 8 */
634 
635 #line 14 "hello.C"
636 struct __mptr *__vptr__1A ;
637 };
638 struct B;
639 
640 #line 14 "hello.C"
641 
642 #line 17 "hello.C"
643 struct B { /* sizeof B == 8 */
644 
645 #line 14 "hello.C"
646 struct __mptr *__vptr__1A ;
647 };
648 
649 #line 23 "hello.C"
650 
651 #line 6 "hello.C"
652 static void foo__1AFi (struct A *__0this , int __2a );
653 
654 #line 10 "hello.C"
655 static void va__1AFi (struct A *__0this , int __2a );
656 
657 #line 26 "hello.C"
658 int main (void ){ _main(); 
659 #line 27 "hello.C"
660 { 
661 #line 28 "hello.C"
662 struct A *__1p ;
663 
664 #line 29 "hello.C"
665 struct B *__0__X52 ;
666 
667 #line 29 "hello.C"
668 struct A *__0__X51 ;
669 
670 #line 28 "hello.C"
671 __1p = (struct A *)( (__0__X52 = 0 ), ( ((__0__X52 || (__0__X52 = (struct B *)__nw__FUl ( (unsigned long )(sizeof (struct B)))
672 #line 28 "hello.C"
673 ))?( (__0__X52 = (struct B *)( (__0__X51 = (((struct A *)__0__X52 ))), ( ((__0__X51 || (__0__X51 = (struct A *)__nw__FUl ( (unsigned long
674 #line 28 "hello.C"
675 )(sizeof (struct A))) ))?(__0__X51 -> __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[0]):0 ), __0__X51 ) ) ), (__0__X52 -> __vptr__1A = (struct __mptr *) __ptbl_vec__hello_C_[1])) :0 ),
676 #line 28 "hello.C"
677 __0__X52 ) ) ;
678 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [1]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [1]).d)), 0 ) ;
679 ((*(((void (*)(struct A *__0this , int __2a ))(__1p -> __vptr__1A [2]).f))))( ((struct A *)((((char *)__1p ))+ (__1p -> __vptr__1A [2]).d)), 25 ) ;
680 }
681 } 
682 #line 31 "hello.C"
683 void __sti__hello_C_main_ (void )
684 #line 664 "incl-master/incl-linux32/iostream.h"
685 { __ct__13Iostream_initFv ( & iostream_init ) ;
686 
687 #line 664 "incl-master/incl-linux32/iostream.h"
688 }
689 
690 #line 31 "hello.C"
691 void __std__hello_C_main_ (void )
692 #line 664 "incl-master/incl-linux32/iostream.h"
693 { __dt__13Iostream_initFv ( & iostream_init , 2) ;
694 
695 #line 664 "incl-master/incl-linux32/iostream.h"
696 }
697 static void foo__1BFi (
698 #line 19 "hello.C"
699 struct B *__0this , 
700 #line 19 "hello.C"
701 int __2a );
702 struct __mptr __vtbl__1B__hello_C[] = {0,0,0,
703 0,0,(__vptp)foo__1BFi ,
704 0,0,(__vptp)va__1AFi ,
705 0,0,0};
706 struct __mptr __vtbl__1A__hello_C[] = {0,0,0,
707 0,0,(__vptp)foo__1AFi ,
708 0,0,(__vptp)va__1AFi ,
709 0,0,0};
710 static void foo__1BFi (struct B *__0this , 
711 #line 19 "hello.C"
712 int __2a )
713 #line 20 "hello.C"
714 { 
715 #line 21 "hello.C"
716 printf ( (const char *)"B a: %d\n",
717 #line 21 "hello.C"
718 __2a ) ;
719 }
720 
721 #line 10 "hello.C"
722 static void va__1AFi (struct A *__0this , 
723 #line 10 "hello.C"
724 int __2a )
725 #line 11 "hello.C"
726 { 
727 #line 12 "hello.C"
728 printf ( (const char *)"va: A %d\n",
729 #line 12 "hello.C"
730 __2a ) ;
731 }
732 
733 #line 6 "hello.C"
734 static void foo__1AFi (struct A *__0this , 
735 #line 6 "hello.C"
736 int __2a )
737 #line 7 "hello.C"
738 { 
739 #line 8 "hello.C"
740 printf ( (const char *)"A %d\n",
741 #line 8 "hello.C"
742 __2a ) ;
743 }
744 
773 struct __mptr* __ptbl_vec__hello_C_[] = {
774 __vtbl__1A__hello_C,
775 __vtbl__1B__hello_C,
776 
777 };
778 
779 #line 31 "hello.C"
780 
781 /* the end */

2016年12月30日 星期五

c++ exception handling 的實作 (2) - 使用 g++ 5.4.0

binary hacks 繁體中文版 item 38, 39, 40, 41 是用 gcc 3.4.4 講解, 用目前的 gcc 5.4.0 (20161231) 編譯的執行檔會直接 segmentfault, 無法在目前的系統上執行有點可惜, 我希望這段程式碼可以使用 gcc 5.4.0 編譯測試, 來試試看吧!

你可能有興趣:
c++ exception handling (1) - 原理篇

挑戰自己, 先來編譯 gcc 5.4.0, glibc 2.23。

env:
x64 debian 64 bit

系統上的 glibc 是使用 dwarf 的方式處理 exception handle 編譯出來的, 要以 static link 編譯 a.cpp 就出了問題, 得自己編譯以 sjlj 處理 exception handle 的 glibc。

dwarf 呼叫的是 _Unwind_RaiseException, sjlj 的版本呼叫的則是 _Unwind_SjLj_RaiseException, symbol 不一樣。

編譯 gcc 5.4.0
../gcc-5.4.0/configure --enable-languages=c,c++ --enable-sjlj-exceptions  --disable-nls
make
make install

編譯 glibc 2.23
../glibc-2.23/configure --disable-nls --disable-sanity-checks
make
make install

編譯 glibc 時會遇到 ldconfig, sln 無法使用 static 編譯的問題, 我用 touch 建立這 2 個檔案, 還有 libgcc.so 找不到的問題, 我修改了路徑以及加入 /usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/libgcc.a, 成功編譯出呼叫 sjlj 系列函式的 glibc。

修改 libgcc 的路徑, 補上紅色那段, 移出原本的 -lgcc
 1 
 3 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/debug/pcprofiledump    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/debug/pcprofiledump.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed  `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
 4 
 5 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/login/utmpdump    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/login/utmpdump.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
 6 
 7 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/elf/sprof    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/elf/sprof.o /media/descent/usbhd/glibc-build/dlfcn/libdl.so.2  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
 8 
 9 
10 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/elf/pldd    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/elf/pldd.o /media/descent/usbhd/glibc-build/elf/xmalloc.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
11 #!/bin/sh
12 gcc   -shared -static-libgcc -Wl,-O1  -Wl,-z,defs -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2  -B/media/descent/usbhd/glibc-build/csu/  -Wl,--version-script=/media/descent/usbhd/glibc-build/libc.map -Wl,-soname=libc.so.6 -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both -nostdlib -nostartfiles -e __libc_main -L/media/descent/usbhd/glibc-build -L/media/descent/usbhd/glibc-build/math -L/media/descent/usbhd/glibc-build/elf -L/media/descent/usbhd/glibc-build/dlfcn -L/media/descent/usbhd/glibc-build/nss -L/media/descent/usbhd/glibc-build/nis -L/media/descent/usbhd/glibc-build/rt -L/media/descent/usbhd/glibc-build/resolv -L/media/descent/usbhd/glibc-build/crypt -L/media/descent/usbhd/glibc-build/mathvec -L/media/descent/usbhd/glibc-build/nptl -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/libc.so -T /media/descent/usbhd/glibc-build/shlib.lds /media/descent/usbhd/glibc-build/csu/abi-note.o /media/descent/usbhd/glibc-build/elf/soinit.os /media/descent/usbhd/glibc-build/libc_pic.os /media/descent/usbhd/glibc-build/elf/sofini.os /media/descent/usbhd/glibc-build/elf/interp.os /media/descent/usbhd/glibc-build/elf/ld.so 
13 
14 #!/bin/sh
15 gcc   -shared -static-libgcc -Wl,-O1  -Wl,-z,defs -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2  -B/media/descent/usbhd/glibc-build/csu/  -Wl,--version-script=/media/descent/usbhd/glibc-build/libc.map -Wl,-soname=libc.so.6 -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both -nostdlib -nostartfiles -e __libc_main -L/media/descent/usbhd/glibc-build -L/media/descent/usbhd/glibc-build/math -L/media/descent/usbhd/glibc-build/elf -L/media/descent/usbhd/glibc-build/dlfcn -L/media/descent/usbhd/glibc-build/nss -L/media/descent/usbhd/glibc-build/nis -L/media/descent/usbhd/glibc-build/rt -L/media/descent/usbhd/glibc-build/resolv -L/media/descent/usbhd/glibc-build/crypt -L/media/descent/usbhd/glibc-build/mathvec -L/media/descent/usbhd/glibc-build/nptl -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/linkobj/libc.so -T /media/descent/usbhd/glibc-build/shlib.lds /media/descent/usbhd/glibc-build/csu/abi-note.o /media/descent/usbhd/glibc-build/elf/soinit.os -Wl,--whole-archive /media/descent/usbhd/glibc-build/linkobj/libc_pic.a -Wl,--no-whole-archive /media/descent/usbhd/glibc-build/elf/sofini.os /media/descent/usbhd/glibc-build/elf/interp.os /media/descent/usbhd/glibc-build/elf/ld.so 
16 
17 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/iconv/iconvconfig    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/iconv/iconvconfig.o /media/descent/usbhd/glibc-build/iconv/strtab.o /media/descent/usbhd/glibc-build/iconv/xmalloc.o /media/descent/usbhd/glibc-build/iconv/hash-string.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed  `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
18 
19 gcc -nostdlib -nostartfiles /usr/local/lib64/libgcc_s.so -o /media/descent/usbhd/glibc-build/iconv/iconv_prog    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/iconv/iconv_prog.o /media/descent/usbhd/glibc-build/iconv/iconv_charmap.o /media/descent/usbhd/glibc-build/iconv/charmap.o /media/descent/usbhd/glibc-build/iconv/charmap-dir.o /media/descent/usbhd/glibc-build/iconv/linereader.o /media/descent/usbhd/glibc-build/iconv/dummy-repertoire.o /media/descent/usbhd/glibc-build/iconv/simple-hash.o /media/descent/usbhd/glibc-build/iconv/xstrdup.o /media/descent/usbhd/glibc-build/iconv/xmalloc.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o
20 
21 gcc -nostdlib -nostartfiles -o /media/descent/usbhd/glibc-build/locale/localedef    -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /media/descent/usbhd/glibc-build/csu/crt1.o /media/descent/usbhd/glibc-build/csu/crti.o `gcc  --print-file-name=crtbegin.o` /media/descent/usbhd/glibc-build/locale/localedef.o /media/descent/usbhd/glibc-build/locale/ld-ctype.o /media/descent/usbhd/glibc-build/locale/ld-messages.o /media/descent/usbhd/glibc-build/locale/ld-monetary.o /media/descent/usbhd/glibc-build/locale/ld-numeric.o /media/descent/usbhd/glibc-build/locale/ld-time.o /media/descent/usbhd/glibc-build/locale/ld-paper.o /media/descent/usbhd/glibc-build/locale/ld-name.o /media/descent/usbhd/glibc-build/locale/ld-address.o /media/descent/usbhd/glibc-build/locale/ld-telephone.o /media/descent/usbhd/glibc-build/locale/ld-measurement.o /media/descent/usbhd/glibc-build/locale/ld-identification.o /media/descent/usbhd/glibc-build/locale/ld-collate.o /media/descent/usbhd/glibc-build/locale/charmap.o /media/descent/usbhd/glibc-build/locale/linereader.o /media/descent/usbhd/glibc-build/locale/locfile.o /media/descent/usbhd/glibc-build/locale/repertoire.o /media/descent/usbhd/glibc-build/locale/locarchive.o /media/descent/usbhd/glibc-build/locale/md5.o /media/descent/usbhd/glibc-build/locale/charmap-dir.o /media/descent/usbhd/glibc-build/locale/simple-hash.o /media/descent/usbhd/glibc-build/locale/xmalloc.o /media/descent/usbhd/glibc-build/locale/xstrdup.o  -Wl,-dynamic-linker=/usr/local/lib/ld-linux-x86-64.so.2 -Wl,-rpath-link=/media/descent/usbhd/glibc-build:/media/descent/usbhd/glibc-build/math:/media/descent/usbhd/glibc-build/elf:/media/descent/usbhd/glibc-build/dlfcn:/media/descent/usbhd/glibc-build/nss:/media/descent/usbhd/glibc-build/nis:/media/descent/usbhd/glibc-build/rt:/media/descent/usbhd/glibc-build/resolv:/media/descent/usbhd/glibc-build/crypt:/media/descent/usbhd/glibc-build/mathvec:/media/descent/usbhd/glibc-build/nptl /media/descent/usbhd/glibc-build/libc.so.6 /media/descent/usbhd/glibc-build/libc_nonshared.a -Wl,--as-needed /media/descent/usbhd/glibc-build/elf/ld.so -Wl,--no-as-needed -lgcc  `gcc  --print-file-name=crtend.o` /media/descent/usbhd/glibc-build/csu/crtn.o

某些檔案編不起來, 直接 touch 建立這些檔案
touch /media/descent/usbhd/glibc-build/locale/localedef
touch /media/descent/usbhd/glibc-build/locale/locale
touch /media/descent/usbhd/glibc-build/catgets/gencat
touch /media/descent/usbhd/glibc-build/timezone/zic
touch /media/descent/usbhd/glibc-build/timezone/zdump
touch /media/descent/usbhd/glibc-build/posix/getconf
touch /media/descent/usbhd/glibc-build/io/pwd

touch /media/descent/usbhd/glibc-build/nss/getent
touch /media/descent/usbhd/glibc-build/nss/makedb
touch /media/descent/usbhd/glibc-build/sunrpc/rpcgen
touch /media/descent/usbhd/glibc-build/nscd/nscd
touch /media/descent/usbhd/glibc-build/elf/sln
touch /media/descent/usbhd/glibc-build/elf/ldconfig

sln, ldconfig 在 make install 時
sln, ldconfig 需要修改執行權限
chmod 755 sln
chmod 755 ldconfig

其內容改成
cat ldconfig
#!/bin/sh
exit 0

cat sln
#!/bin/sh
exit 0

以 dynamic link 編譯 a.cpp 時, 該注意的地方
dynamic link libgcc 時
使用 ldd 查看 so
descent@deb:eh_impl$ /usr/bin/ldd a
linux-vdso.so.1 (0x00007ffe1b261000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f549386f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f549356b000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5493354000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5492fb6000)
/lib64/ld-linux-x86-64.so.2 (0x00005636522ad000)

/lib/x86_64-linux-gnu/libgcc_s.so.1 沒有執行我們編譯的那個,

export LD_LIBRARY_PATH=/usr/local/lib64/ # x64 64 bit environment
export LD_LIBRARY_PATH=/usr/local/lib/ # x86 32 bit environment

再一次 ldd
descent@deb:eh_impl$ /usr/bin/ldd a
linux-vdso.so.1 (0x00007fff905b2000)
libstdc++.so.6 => /usr/local/lib64/libstdc++.so.6 (0x00007f463eb5e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f463e843000)
libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007f463e631000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f463e293000)
/lib64/ld-linux-x86-64.so.2 (0x000055bb0e8f5000)

注意 libgcc_s.so.1 path。

static link a.cpp 需要重新編譯 glibc, 所以才編譯了 glibc-2.23, 為什麼需要使用 static link 呢? 因為使用 gdb 時, libgcc.so 的行號對應似乎有問題, 所以才想要使用 static link, 不過由於有 2 套 glibc, 我不知道怎麼讓 gcc 使用我自己編的那個, 就有了很蠢 d.sh 那個方法, -lunwind 也可以拿掉。

而和 gcc 3.4.4 搭配的 glibc 是 2.3.5, 2.23 無法使用 gcc 3.4.4 編譯, 在目前的系統上因為有些工具太新而不能編譯, 例如: sed, awk, make, binutility ...

d.sh for static link
1 #!/bin/sh
2 /usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/collect2 -plugin /usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/liblto_plugin.so -plugin-opt=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper -plugin-opt=-fresolution=/tmp/ccnKcDbO.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_eh -plugin-opt=-pass-through=-lc -m elf_x86_64 -static -o a /usr/local/lib/crt1.o /usr/local/lib/crti.o /usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/crtbeginT.o -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0 -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/local/lib -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../.. a.o -lstdc++ -lm --start-group -lunwind -lgcc -lgcc_eh -lc --end-group /usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/crtend.o /usr/local/lib/crtn.o

一開始沒有修改 lsda, 執行下去馬上就 segmentfault, 所以該啟動反組譯工程了。

有了自行編譯的 glibc 之後, 就可以使用 gdb debug 這個 static link 的檔案, 方便理解整個來龍去脈, 遺憾的就是 lsda 裡頭的格式我還是不了解, 但我還是可以把 lsda 的格式填對, 怎麼做呢?

list 1 是 .gcc_except_table, 就是那個 lsda, 可是 L7 不是一個值, 而是一個需要計算才知道的值, 我要怎麼填呢?

list 1. g++ -S a.cpp
 1 .LFE1042:
 2         .section        .gcc_except_table
 3         .align 4
 4 .LLSDA1042:
 5         .byte   0xff
 6         .byte   0x3
 7         .uleb128 .LLSDATT1042-.LLSDATTD1042
 8 .LLSDATTD1042:
 9         .byte   0x1
10         .uleb128 .LLSDACSE1042-.LLSDACSB1042
11 .LLSDACSB1042:
12         .uleb128 0
13         .uleb128 0x1
14         .uleb128 0x1
15         .uleb128 0
16 .LLSDACSE1042:
17         .byte   0x1
18         .byte   0
19         .align 4
20         .long   _ZTIi
21 .LLSDATT1042:
22         .text

靠 objdump -D a, list 2 L1329 開始的 ff, 03, 0x0d, 0x01, 4, 0, 1, 1, 0, 1, 0 不就是答案了嗎?

list 2. objdump -D a
1322 Disassembly of section .gcc_except_table:
1323
1324 00000000004010e4 <.gcc_except_table>:
1325   4010e4:   ff                      (bad)
1326   4010e5:   ff 01                   incl   (%rcx)
1327   4010e7:   02 00                   add    (%rax),%al
1328   4010e9:   00 00                   add    %al,(%rax)
1329   4010eb:   00 ff                   add    %bh,%bh
1330   4010ed:   03 0d 01 04 00 01       add    0x1000401(%rip),%ecx        # 14014f4 <_end+0xdff30c>
1331   4010f3:   01 00                   add    %eax,(%rax)
1332   4010f5:   01 00                   add    %eax,(%rax)
1333   4010f7:   00 d0                   add    %dl,%al
1334   4010f9:   21 60 00                and    %esp,0x0(%rax)
1335
1336 Disassembly of section .init_array:

把這串神秘的數字換掉 binary hack 的範例程式, 就可以使用 g++ 5.4.0 來編譯這個程式, 並且成功的執行 exception handle, 爽阿!

完整 source code 請參考 https://github.com/descent/eh_impl, 支援 gcc 3.4.4 以及 gcc 5.4.0。

2016年12月25日 星期日

c++ exception handling 的實作 (1) - 原理篇


20161227 補充
原本的文章寫的實在是太爛了, 先寫出來是為了讓自己記得改善, 只是沒想到這麼快就可以改善了。我竟然忘了參考 binary hacks 繁體中文版 item 38, 39, 40, 41, 以前看過好幾次都看不懂, 這次竟然看懂了, 趕緊補上像樣點的心得, 若你能看到我的 git log, 一定可以理解我花在這篇文章的心血。

一直以來都會以一個小程式來說明某個概念, 而不是形而上的理論/觀念的敘述, 可以用 gdb 來追蹤, 觀察執行後的結果, 反組譯執行檔, 這是我認為在程式學習上很重要的步驟, 只有口頭敘述, 沒有將它具象化, 感覺好像明白, 但又有什麼沒搞懂的模糊感。

你一定有興趣:
c++ exception handling (2) - 使用 g++ 5.4.0

fig 0. 金字塔知識門檻

c++ exception handling 還真不是普通的複雜, 我目前僅僅知道其實作原理, 但實作細節太複雜, 沒能搞懂。面試 c++ 時常會看到 virtaul function 如何實作的考題, 但卻沒看過問 c++ exception handling 怎麼實作, 沒有別的原因, 就是因為它難到只有很少人才知道怎麼實作, 不知道怎麼實做 exception handling 一點都不丟臉, 因為連 cfront 也搞不定

挑戰這麼難的東西, 又沒有什麼經濟效益, 我一定是阿達了。廢話不多說, 來看看 gcc 怎麼實作 c++ exception handling。

vc 和 gcc 有不同的作法, 我研究的是 gcc 的作法。

看了不少參考資料, 本篇文章以 binary hacks 繁體中文版 item 38, 39, 40, 41 為主, 因為有個小程式可以用來實驗以及說明 exception handle。

下面這 3 個函式是最主要的關鍵:
1 __cxa_throw
2 _Unwind_RaiseException
3 __gxx_personality_v0 (int version, _Unwind_Action actions, _Unwind_Exception_Class exception_class, struct _Unwind_Exception *ue_header, struct _Unwind_Context *context)

這些函式的 source code 在 gcc libgcc 目錄下, libgcc 是一個很神秘的 library, 裡頭幾乎是 gcc 特異功能的實做。unwind, 軟體浮點數 ... 都是在這裡。

gcc-3.4.4/gcc
gcc-5.4.0/libgcc

_Unwind_SjLj_RaiseException
_Unwind_RaiseException
gcc-5.4.0/libgcc/unwind-sjlj.c
gcc-5.4.0/libgcc/unwind.inc
#define PERSONALITY_FUNCTION __gxx_personality_v0

PERSONALITY_FUNCTION (int version,
_Unwind_Action actions,
_Unwind_Exception_Class exception_class,
struct _Unwind_Exception *ue_header,
struct _Unwind_Context *context)

/gcc-5.4.0/libstdc++-v3/libsupc++/eh_personality.cc
__cxa_throw
extern "C" void __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo, void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
gcc-5.4.0/libstdc++-v3/libsupc++/eh_throw.cc

a.cpp L116 throw 100;
會轉成呼叫 (ref a.cpp L118 ~ 120)
__cxa_allocate_exception()
__cxa_throw()

__cxa_throw() 發動時的流程:
__cxxabiv1::__cxa_throw
->
執行的是 _Unwind_SjLj_RaiseException

#ifdef _GLIBCXX_SJLJ_EXCEPTIONS
_Unwind_SjLj_RaiseException (&header->exc.unwindHeader);
#else
_Unwind_RaiseException (&header->exc.unwindHeader);
#endif
|
|
|->   __gxx_personality_sj0
|
|
|-> uw_install_context

uw_install_context 會呼叫 longjmp 回到上一個函式, 以 a.cpp 來說, 就是 func1()。

__gxx_personality_sj0 是幹麻用的? 搜尋是不是有對應的 catch statement, 或是有那個物件需要解構, 得去執行解構函式, 要跳去的那個位址有個很厲害的術語叫做 landing_pad, source code 會看到 landing_pad = info.LPStart + cs_lp;, 就是用來找到要去執行解構函式或是 catch statement 的位址, 一旦 uw_install_context 執行之後, 就會跳去那個位址。

像 func1() 有個物件需要解構, __gxx_personality_sj0 知道這件事情, 所以才要讓 _Unwind_RaiseException 往 func1 跳, 很神奇是吧! 一但 func1() 拿掉 a.cpp L128 那個 Obj obj, 就不會跳回 func1()。

那 __gxx_personality_sj0 怎麼知道這些事情的, 這個就很複雜, 得靠 g++ 在編譯的時候塞入 dwarf 裡頭的資訊, 而要怎麼取出這些資訊也很神秘, 和 CIE 及 FDE 有關, 不過我不知道這 2 個是什麼東西, 也不知道怎麼取出來, 就算讀了 source code, 也還是看不懂。

另外 __gxx_personality_sj0 會比對丟出的例外物件和 catch 的例外物件, 如果一樣, landing_pad 才會往那個 catch 指定, 這就是為什麼 exception handle 需要 rtti 的支援, rtti 的 type_info 物件, 就是拿來比對這 2 個例外物件有沒有一致。

bt.cpp 只有模擬一半的功能, 使用 setjmp/longjmp, back_to_func 可以回到前一個 function, sjlj 就是用類似的方法串起這些 jmp_buf; 不過我不知道怎麼使用 .eh_frame, .gcc_except_table section 裡頭的資料來得知是不是有那個解構函式需要執行, 是不是有符合的 catch statement。

bt.cpp
 1 #include <setjmp.h>
 2 #include <string>
 3 #include <map>
 4 
 5 using namespace std;
 6 
 7 map<string, jmp_buf> stack_frame;
 8 
 9 void back_to_func(const string &fn)
10 {
11   //jmp_buf frame =  stack_frame[fn];
12   //stack_frame[fn];
13   longjmp(stack_frame[fn], 5);
14 }
15 
16 void f3()
17 {
18   printf("in f3\n");
19   back_to_func("f2");
20 }
21 
22 void f2()
23 {
24   jmp_buf frame; 
25   int ret = setjmp(frame);
26   if (ret == 0)
27   {
28     stack_frame.insert({"f2", frame});
29     f3();
30   }
31   else
32   {
33     printf("back to f2\n");
34     back_to_func("f1");
35   }
36 }
37 
38 void f1()
39 {
40   jmp_buf frame; 
41   int ret = setjmp(frame);
42   if (ret == 0)
43   {
44     stack_frame.insert({"f1", frame});
45     f2();
46   }
47   else
48   {
49     printf("back to f1\n");
50     back_to_func("main");
51   }
52 }
53 
54 int main(int argc, char *argv[])
55 {
56   jmp_buf frame; 
57   int ret = setjmp(frame);
58   if (ret == 0)
59   {
60     stack_frame.insert({"main", frame});
61     f1(); 
62   }
63   else
64   {
65     printf("back to main\n");
66   }
67   printf("end main\n");
68   return 0;
69 }

binary hacks 繁體中文版 item 38, 39, 40, 41 是用 gcc 3.4.4 講解, 雖然過時了, 但基本原理是一樣的, 就先從 gcc 3.4.4 的建構開始吧。

g++ 使用 setjmp/longjmp, dwarf 這兩種來支援 c++ exception handle, 目前的 gcc 5 似乎不使用 --enable-sjlj-exceptions, 我比較熟悉 setjmp/longjmp 的作法, dwarf2 太苦了, 我不想走這條路, 先以 --enable-sjlj-exceptions 來建構 gcc 3.4.4。

我以熟悉的 setjmp/long 來學習, 編譯 gcc 3.4.4 加上 --enable-sjlj-exceptions, 即使用以 setjmp/longjmp 實做的 exception handle。

setjmp/longjmp, dwarf 是用來處理 unwind, 就是從目前的函式回到上一個函式, 類似 bt.cpp 做的事情, dwarf 的作法需要去理解 dwarf 格式, 聽說是不得了的複雜, 我不想花時間在上頭, 而 setjmp/longjmp 我已經知道其實作原理, 不需要在花額外的功夫。

另外一個需要的能力就是知道要回到那一個 function, 這就是靠神秘的 LSDA 的內容來得知, g++ 會在 .gcc_except_table section 插入某些資訊, 讓 __gxx_personality_sj0 可以用來判斷要回到那個函式。

env:
32 bit debian

編譯 gcc-3.4.4
tar xvf gcc-3.4.4.tar.bz2
mkdir gcc-build
cd gcc-build
../gcc-3.4.4/configure --enable-languages=c,c++ --enable-sjlj-exceptions
make
make install

編譯時可能會遇到一些 header 的問題, 我把 /usr/include/i386-linux-gnu/* link 到 /usr/include

root@debian32:/usr/include# ls -l sys
lrwxrwxrwx 1 root root 32 Dec 26 15:42 sys -> /usr/include/i386-linux-gnu/sys

沒支援 --enable-sjlj-exceptions g++ 的編譯錯誤
descent@debian64:eh_impl$ g++ -g -o a a.cpp
/tmp/ccRq4BBp.o: In function `main':
/home/descent/git/eh_impl/a.cpp:126: undefined reference to `__gxx_personality_sj0'
/home/descent/git/eh_impl/a.cpp:138: undefined reference to `_Unwind_SjLj_Register'
/home/descent/git/eh_impl/a.cpp:142: undefined reference to `_Unwind_SjLj_Unregister'
collect2: error: ld returned 1 exit status

支援 --enable-sjlj-exceptions 的 g++
descent@debian32:eh_impl$ g++ -v
Reading specs from /usr/local/lib/gcc/i686-pc-linux-gnu/3.4.4/specs
Configured with: ../gcc-3.4.4/configure --enable-languages=c,c++ --enable-sjlj-exceptions
Thread model: posix
gcc version 3.4.4

a.cpp 是 binary hack 書上提供的範例, 提供了對照, try/catch/throw 是怎麼轉成一般的 c++ 程式碼, 看上去就清楚了, 最麻煩的就是那個 lsda 到底是怎麼樣的資料結構, 可惜書上也沒寫得很清楚, 看來只能看第 0 手資料了。

list 1. a.cpp 執行結果
/usr/local/bin/g++ -g -o a a.cpp 

使用 ldd 查看 so

descent@debian32:eh_impl$ ldd a
linux-gate.so.1 (0xb77bf000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xb763a000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xb75e5000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xb75c8000) # 沒有 dynamic link 到我們編譯的那個 libgcc_s.so.1
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7411000)
/lib/ld-linux.so.2 (0x8007d000)

debian32:eh_impl$ export LD_LIBRARY_PATH=/usr/local/lib # x86 32 bit environment

再一次 ldd
descent@debian32:eh_impl$ ldd a
linux-gate.so.1 (0xb779e000)
libstdc++.so.6 => /usr/local/lib/libstdc++.so.6 (0xb76b6000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xb764f000)
libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 (0xb7647000) # link 到我們自己編譯的 gcc 3.4.4 的 libgcc.so
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7490000)
/lib/ld-linux.so.2 (0x800e2000)

這樣就對了。

descent@debian32:eh_impl$ ./a 
func1 begin
obj ctor
func2 begin
obj dtor
thrown_obj: 100

list 1 的結果可以成功呼叫解構函式, 以及跑到正確的 catch 程式碼。可以用 gdb 跑跑看, exception handle 的神秘感解除了一半, 另外一半還在 libunwind, libgcc 裡頭的函式。

a.cpp L166 就是 L118 ~ 120 那 3 行; a.cpp L137 ~ 145 就是 L149 ~ 167 那麼多行。

a.cpp
  1 // test c++ exception handle by g++ 3.4.4
  2 // example code from binary hacks chinese version, page 145
  3 
  4 #include <cstdio>
  5 #include <iostream>
  6 #include <typeinfo>
  7 using namespace std;
  8 
  9 #include <unwind.h>
 10 
 11 extern "C" 
 12 {
 13   // libsupc++/eh_alloc.cc
 14   void * __cxa_allocate_exception(std::size_t thrown_size);
 15 
 16   // libsupc++/eh_throw.cc
 17   //void __cxa_throw (void *obj, std::type_info *tinfo, void (*dest) (void *));
 18   void __cxa_throw (void *obj, void *tinfo, void (*dest) (void *));
 19 
 20   // libsupc++/eh_catch.cc
 21   void * __cxa_begin_catch (void *exc_obj_in);
 22   void __cxa_end_catch ();
 23 
 24 
 25   #define PERSONALITY_FUNCTION    __gxx_personality_sj0
 26 
 27   // libsupc++/eh_personality.cc
 28   _Unwind_Reason_Code PERSONALITY_FUNCTION (int version,
 29                       _Unwind_Action actions,
 30                       _Unwind_Exception_Class exception_class,
 31                       struct _Unwind_Exception *ue_header,
 32                       struct _Unwind_Context *context);
 33 
 34 
 35 }
 36 
 37 struct Lsda
 38 {
 39   unsigned char start_format;
 40   unsigned char type_format;
 41   unsigned char type_length;
 42   unsigned char call_site_format;
 43   unsigned char call_site_length;
 44   unsigned char call_site_table[2];
 45   signed char action_table[2];
 46   const std::type_info *catch_type[1];
 47 }__attribute__((packed));
 48 
 49 Lsda my_lsda=
 50 {
 51   0xff,
 52   0x00,
 53   10,
 54   0x01,
 55   2,
 56   {0,1},
 57   {1,0},
 58   &typeid(int),
 59 };
 60 
 61 
 62 // unwind-sjlj.c
 63 /* This structure is allocated on the stack of the target function.
 64    This must match the definition created in except.c:init_eh.  */
 65 struct SjLj_Function_Context
 66 {
 67   /* This is the chain through all registered contexts.  It is
 68      filled in by _Unwind_SjLj_Register.  */
 69   struct SjLj_Function_Context *prev;
 70   
 71   /* This is assigned in by the target function before every call
 72      to the index of the call site in the lsda.  It is assigned by
 73      the personality routine to the landing pad index.  */
 74   int call_site;
 75   
 76   /* This is how data is returned from the personality routine to
 77      the target function's handler.  */
 78   _Unwind_Word data[4];
 79   
 80   /* These are filled in once by the target function before any
 81      exceptions are expected to be handled.  */
 82   _Unwind_Personality_Fn personality;
 83   void *lsda;
 84 
 85 #ifdef DONT_USE_BUILTIN_SETJMP
 86   /* We don't know what sort of alignment requirements the system
 87      jmp_buf has.  We over estimated in except.c, and now we have
 88      to match that here just in case the system *didn't* have more
 89      restrictive requirements.  */
 90   jmp_buf jbuf __attribute__((aligned));
 91 #else
 92   void *jbuf[];
 93 #endif 
 94 };
 95 
 96 //#define CXX_EH
 97 
 98 class Obj
 99 {
100   public:
101     Obj()
102     {
103       cout << "obj ctor" << endl;
104     }
105     ~Obj()
106     {
107       cout << "obj dtor" << endl;
108     }
109 
110 };
111 
112 void func2()
113 {
114   cout << "func2 begin" << endl;
115 #ifdef CXX_EH
116   throw 100;
117 #else
118   void *throw_obj = __cxa_allocate_exception(sizeof(int));
119   *(int*)throw_obj = 100; // 這就是那個 throw 100, 的那個 100
120   __cxa_throw(throw_obj, (std::type_info*)&typeid(int), NULL);
121 #endif
122   cout << "func2 end" << endl;
123 }
124 
125 void func1()
126 {
127   cout << "func1 begin" << endl;
128   Obj obj;
129 
130   func2();
131   cout << "func1 end" << endl;
132 }
133 
134 int main(int argc, char *argv[])
135 {
136 #ifdef CXX_EH
137   try
138   {
139     cout << "hello" << endl; 
140     func1();
141   }
142   catch (int eh)
143   {
144     cout << "catch int: " << eh << endl; 
145   }
146 
147 #else
148   
149   SjLj_Function_Context sjlj;
150 
151   sjlj.personality = __gxx_personality_sj0;
152   sjlj.lsda = (void*)&my_lsda;
153   sjlj.call_site = 1;
154 
155   if (__builtin_setjmp(sjlj.jbuf) == 1)
156   {
157     void *thrown_obj = __cxa_begin_catch((void*)sjlj.data[0]);
158     printf("thrown_obj: %d\n", *(int*)thrown_obj);
159     __cxa_end_catch();
160   }
161   else
162   {
163     _Unwind_SjLj_Register(&sjlj);
164     //throw 100;
165     func1();
166   }
167   _Unwind_SjLj_Unregister(&sjlj);
168 #endif
169   return 0;
170 }

objdump -d a 看不到詳細的反組譯程式碼, 我使用 gdb 來反組譯, 這是意外的收穫。

list 2. dis.gdb
 1   >0x8048b16 <func1()+96>  lea    -0x28(%ebp),%eax
 2    0x8048b19 <func1()+99>  mov    %eax,(%esp)       
 3    0x8048b1c <func1()+102> call   0x8048d2e <Obj::Obj()>
 4    0x8048b21 <func1()+107> movl   $0x1,-0x58(%ebp)           
 5    0x8048b28 <func1()+114> call   0x8048a32 <func2()> 
 6    0x8048b2d <func1()+119> movl   $0x8048e5a,0x4(%esp)     
 7    0x8048b35 <func1()+127> movl   $0x804b080,(%esp)       
 8    0x8048b3c <func1()+134> call   0x80487d0 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt> 
 9    0x8048b41 <func1()+139> movl   $0x8048780,0x4(%esp)
10    0x8048b49 <func1()+147> mov    %eax,(%esp)        
11    0x8048b4c <func1()+150> call   0x8048770 <_ZNSolsEPFRSoS_E@plt> 
12    0x8048b51 <func1()+155> jmp    0x8048b8c <func1()+214>    
13    0x8048b53 <func1()+157> lea    0x18(%ebp),%ebp    
14    0x8048b56 <func1()+160> mov    -0x54(%ebp),%eax  
15    0x8048b59 <func1()+163> mov    %eax,-0x64(%ebp) 
16    0x8048b5c <func1()+166> mov    -0x64(%ebp),%edx 
17    0x8048b5f <func1()+169> mov    %edx,-0x60(%ebp)   
18    0x8048b62 <func1()+172> lea    -0x28(%ebp),%eax  
19    0x8048b65 <func1()+175> mov    %eax,(%esp)      
20    0x8048b68 <func1()+178> movl   $0x0,-0x58(%ebp) 
21    0x8048b6f <func1()+185> call   0x8048d02 <Obj::~Obj()>  the 1st dtor 
22    0x8048b74 <func1()+190> mov    -0x60(%ebp),%eax  
23    0x8048b77 <func1()+193> mov    %eax,-0x64(%ebp) 
24    0x8048b7a <func1()+196> mov    -0x64(%ebp),%edx
25    0x8048b7d <func1()+199> mov    %edx,(%esp)    
26    0x8048b80 <func1()+202> movl   $0xffffffff,-0x58(%ebp) 
27    0x8048b87 <func1()+209> call   0x80487e0 <_Unwind_SjLj_Resume@plt> 
28    0x8048b8c <func1()+214> lea    -0x28(%ebp),%eax      
29    0x8048b8f <func1()+217> mov    %eax,(%esp)          
30    0x8048b92 <func1()+220> movl   $0xffffffff,-0x58(%ebp)
31    0x8048b99 <func1()+227> call   0x8048d02 <Obj::~Obj()>          the 2nd dtor  
32    0x8048b9e <func1()+232> lea    -0x5c(%ebp),%eax 
33    0x8048ba1 <func1()+235> mov    %eax,(%esp)     
34    0x8048ba4 <func1()+238> call   0x8048810 <_Unwind_SjLj_Unregister@plt>
35    0x8048ba9 <func1()+243> add    $0x6c,%esp     
36    0x8048bac <func1()+246> pop    %ebx          
37    0x8048bad <func1()+247> pop    %esi         
38    0x8048bae <func1()+248> pop    %edi        
39    0x8048baf <func1()+249> pop    %ebp       
40    0x8048bb0 <func1()+250> ret   

list 2 L21, L31 有 2 個 dtor, 很奇怪吧, L21 是給 exception handle 用的, 當從 throw 回到 func1 時, 會莫名的抵達這裡, 事實上是回到 L13 0x8048b53 這裡, 然後在執行 L27 回到上一個 stack frame (本例來說就是 main); L31 則是給正常執行流程呼叫的 dtor, L12 有個狡猾的 jmp, 真是機關算盡。

list 3 是 g++ 3.4.4 的反組譯版本, 更清楚了, 我應該早點想到的, 它不只為我解除了 2 個 dtor 的疑惑, 還把莫名會抵達 func1() 的原因也找了出來, 甚至連那個 Lsda 也幫我釐清了, 也因為知道 Lsda 的內容, 我連帶改出 g++ 5.4.0 的版本了。

list 3 是使用 try/catch/throw 的版本, list 3 L302, 303, 是不是和自己填入 a.cpp L151 ~ 153 一樣呢?

list 3 L303, L387 就是那個該死的 lsda, 從 list 3 L387 ~ L402, 在 .gcc_except_table section (就是 LSDA - Language Specific Data Area), 又是另外一個狡猾的地方。

至於 g++ 5.4.0 我怎麼改出來的呢? 就是用 g++ 5.4.0 去反組譯 try/catch/throw 的版本, 把 .gcc_except_table section, 填到那個 lsda 就好了, 果然還真的不同。

再來是那個莫名回到 func1 的動作是怎麼作到的呢? 這個困擾我好久, 用 gdb 追也找不出所以然, 照理說應該要有一個 setjmp 在這裡, 才能透過 longjmp 回到這, 但我就一直找不到哪裡有 call setjmp, 直到我用 g++ -S 之後才看到, 原來 g++ 在 func1 安插了類似 setjmp 的程式碼, 這才讓 _Unwind_RaiseException 有能力回到 func1。

list 3 L182 _Unwind_SjLj_Register 的動作類似 bt.cpp 那個 map<string, jmp_buf>, 把每一個 fuction 要回來的位置記起來, 它的參數 SjLj_Function_Context 裡頭有 jmp_buf, 得先把 jmp_buf 填好才行, 讓 uw_install_context 的 longjmp 回到這裡。

由於是 g++ 插入的 code, 得從組合語言去看出來才行, 還真是難。L177 的 .L18 就是 setjmp 紀錄起來的值, 這裡就是在填上面說的 jmp_buf 的部份, 但並不是產生呼叫 setjmp 的程式碼, 而是填入那個 jmp_buf 所需要的值就可以了, 所以 _Unwind_RaiseException 發動 uw_install_context, 就會回到 L202, 和 gdb 的顯示是一樣的。

把 func1() Obj obj; 拿掉, 再看 g++ 產生的 a.s, 就會發現那個 func1 和 c 的長相一樣, 不會被偷偷插入那麼多程式碼了。

#define uw_install_context(CURRENT, TARGET)       \
  do                                              \
  {                                               \
    _Unwind_SjLj_SetContext ((TARGET)->fc);    \
    longjmp ((TARGET)->fc->jbuf, 1);        \
  }                                               \
  while (0)

list 3 L172 ~ 173 是不是有類似的行為, 塞入 __gxx_personality_sj0, lsda 這些資料, lsda 是我目前還無法突破的部份。

list 3. g++-3.4.4 -S a.cpp a-3.3.4.s
  1  .file "a.cpp"
  2  .text
  3  .align 2
  4  .type _ZSt17__verify_groupingPKcjRKSs, @function
116 .LC0:
117  .string "func2 begin"
118 .LC1:
119  .string "func2 end"
120  .text
121  .align 2
122 .globl _Z5func2v
123  .type _Z5func2v, @function
124 _Z5func2v:
125  pushl %ebp
126  movl %esp, %ebp
127  subl $24, %esp
128  movl $.LC0, 4(%esp)
129  movl $_ZSt4cout, (%esp)
130  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
131  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
132  movl %eax, (%esp)
133  call _ZNSolsEPFRSoS_E
134  movl $4, (%esp)
135  call __cxa_allocate_exception
136  movl $100, (%eax)
137 .L11:
138  movl $0, 8(%esp)
139  movl $_ZTIi, 4(%esp)
140  movl %eax, (%esp)
141  call __cxa_throw
142  movl $.LC1, 4(%esp)
143  movl $_ZSt4cout, (%esp)
144  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
145  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
146  movl %eax, (%esp)
147  call _ZNSolsEPFRSoS_E
148  leave
149  ret
150 .L10:
151  .size _Z5func2v, .-_Z5func2v
152 .globl _Unwind_SjLj_Resume
153 .globl __gxx_personality_sj0
154 .globl _Unwind_SjLj_Register
155 .globl _Unwind_SjLj_Unregister
156  .section .rodata
157 .LC2:
158  .string "func1 begin"
159 .LC3:
160  .string "func1 end"
161  .text
162  .align 2
163 .globl _Z5func1v
164  .type _Z5func1v, @function
165 _Z5func1v:
166  pushl %ebp
167  movl %esp, %ebp
168  pushl %edi
169  pushl %esi
170  pushl %ebx
171  subl $108, %esp
172  movl $__gxx_personality_sj0, -68(%ebp)
173  movl $.LLSDA1420, -64(%ebp)
174  leal -60(%ebp), %eax
175  leal -24(%ebp), %edx
176  movl %edx, (%eax)
177  movl $.L18, %edx
178  movl %edx, 4(%eax)
179  movl %esp, 8(%eax)
180  leal -92(%ebp), %eax
181  movl %eax, (%esp)
182  call _Unwind_SjLj_Register
183  movl $.LC2, 4(%esp)
184  movl $_ZSt4cout, (%esp)
185  movl $-1, -88(%ebp)
186  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
187  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
188  movl %eax, (%esp)
189  call _ZNSolsEPFRSoS_E
190  leal -40(%ebp), %eax
191  movl %eax, (%esp)
192  call _ZN3ObjC1Ev
193  movl $1, -88(%ebp)
194  call _Z5func2v
195  movl $.LC3, 4(%esp)
196  movl $_ZSt4cout, (%esp)
197  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
198  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
199  movl %eax, (%esp)
200  call _ZNSolsEPFRSoS_E
201  jmp .L15
202 .L18:
203  leal 24(%ebp), %ebp
204  movl -84(%ebp), %eax
205  movl %eax, -100(%ebp)
206 .L14:
207  movl -100(%ebp), %edx
208  movl %edx, -96(%ebp)
209  leal -40(%ebp), %eax
210  movl %eax, (%esp)
211  movl $0, -88(%ebp)
212  call _ZN3ObjD1Ev
213  movl -96(%ebp), %eax
214  movl %eax, -100(%ebp)
215 .L16:
216  movl -100(%ebp), %edx
217  movl %edx, (%esp)
218  movl $-1, -88(%ebp)
219  call _Unwind_SjLj_Resume
220 .L15:
221  leal -40(%ebp), %eax
222  movl %eax, (%esp)
223  movl $-1, -88(%ebp)
224  call _ZN3ObjD1Ev
225 .L13:
226  leal -92(%ebp), %eax
227  movl %eax, (%esp)
228  call _Unwind_SjLj_Unregister
229  addl $108, %esp
230  popl %ebx
231  popl %esi
232  popl %edi
233  popl %ebp
234  ret
235  .size _Z5func1v, .-_Z5func1v
236  .section .gcc_except_table,"a",@progbits
237 .LLSDA1420:
238  .byte 0xff
239  .byte 0xff
240  .byte 0x1
241  .uleb128 .LLSDACSE1420-.LLSDACSB1420
242 .LLSDACSB1420:
243  .uleb128 0x0
244  .uleb128 0x0
245 .LLSDACSE1420:
246  .text
247  .section .rodata
248 .LC4:
249  .string "obj dtor\n"
250  .section .gnu.linkonce.t._ZN3ObjD1Ev,"ax",@progbits
251  .align 2
252  .weak _ZN3ObjD1Ev
253  .type _ZN3ObjD1Ev, @function
254 _ZN3ObjD1Ev:
255  pushl %ebp
256  movl %esp, %ebp
257  subl $8, %esp
258  movl $.LC4, (%esp)
259  call printf
260  leave
261  ret
262  .size _ZN3ObjD1Ev, .-_ZN3ObjD1Ev
263  .section .rodata
264 .LC5:
265  .string "obj ctor\n"
266  .section .gnu.linkonce.t._ZN3ObjC1Ev,"ax",@progbits
267  .align 2
268  .weak _ZN3ObjC1Ev
269  .type _ZN3ObjC1Ev, @function
270 _ZN3ObjC1Ev:
271  pushl %ebp
272  movl %esp, %ebp
273  subl $8, %esp
274  movl $.LC5, (%esp)
275  call printf
276  leave
277  ret
278  .size _ZN3ObjC1Ev, .-_ZN3ObjC1Ev
279  .section .rodata
280 .LC6:
281  .string "hello"
282 .LC7:
283  .string "catch int: "
284  .text
285  .align 2
286 .globl main
287  .type main, @function
288 main:
289  pushl %ebp
290  movl %esp, %ebp
291  pushl %edi
292  pushl %esi
293  pushl %ebx
294  subl $92, %esp
295  andl $-16, %esp
296  movl $0, %eax
297  addl $15, %eax
298  addl $15, %eax
299  shrl $4, %eax
300  sall $4, %eax
301  subl %eax, %esp
302  movl $__gxx_personality_sj0, -44(%ebp)
303  movl $.LLSDA1421, -40(%ebp)
304  leal -36(%ebp), %eax
305  leal -12(%ebp), %edx
306  movl %edx, (%eax)
307  movl $.L31, %edx
308  movl %edx, 4(%eax)
309  movl %esp, 8(%eax)
310  leal -68(%ebp), %eax
311  movl %eax, (%esp)
312  call _Unwind_SjLj_Register
313  movl $.LC6, 4(%esp)
314  movl $_ZSt4cout, (%esp)
315  movl $2, -64(%ebp)
316  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
317  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
318  movl %eax, (%esp)
319  call _ZNSolsEPFRSoS_E
320  call _Z5func1v
321  jmp .L24
322 .L30:
323  cmpl $1, -84(%ebp)
324  je .L25
325  movl -76(%ebp), %eax
326  movl %eax, (%esp)
327  movl $-1, -64(%ebp)
328  call _Unwind_SjLj_Resume
329 .L25:
330  movl -76(%ebp), %edx
331  movl %edx, (%esp)
332  movl $-1, -64(%ebp)
333  call __cxa_begin_catch
334  movl (%eax), %eax
335  movl %eax, -16(%ebp)
336  movl $.LC7, 4(%esp)
337  movl $_ZSt4cout, (%esp)
338  movl $1, -64(%ebp)
339  call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
340  movl %eax, %edx
341  movl -16(%ebp), %eax
342  movl %eax, 4(%esp)
343  movl %edx, (%esp)
344  call _ZNSolsEi
345  movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
346  movl %eax, (%esp)
347  call _ZNSolsEPFRSoS_E
348  jmp .L27
349 .L31:
350  leal 12(%ebp), %ebp
351  movl -64(%ebp), %eax
352  movl -60(%ebp), %edx
353  movl %edx, -76(%ebp)
354  movl -56(%ebp), %edx
355  movl %edx, -84(%ebp)
356  cmpl $1, %eax
357  je .L30
358 .L26:
359  movl -76(%ebp), %eax
360  movl %eax, -80(%ebp)
361  call __cxa_end_catch
362  movl -80(%ebp), %edx
363  movl %edx, -76(%ebp)
364 .L28:
365  movl -76(%ebp), %eax
366  movl %eax, (%esp)
367  movl $-1, -64(%ebp)
368  call _Unwind_SjLj_Resume
369 .L27:
370  call __cxa_end_catch
371 .L24:
372  movl $0, -72(%ebp)
373 .L23:
374  leal -68(%ebp), %eax
375  movl %eax, (%esp)
376  call _Unwind_SjLj_Unregister
377  movl -72(%ebp), %eax
378  leal -12(%ebp), %esp
379  popl %ebx
380  popl %esi
381  popl %edi
382  popl %ebp
383  ret
384  .size main, .-main
385  .section .gcc_except_table
386  .align 4
387 .LLSDA1421:
388  .byte 0xff
389  .byte 0x0
390  .uleb128 .LLSDATT1421-.LLSDATTD1421
391 .LLSDATTD1421:
392  .byte 0x1
393  .uleb128 .LLSDACSE1421-.LLSDACSB1421
394 .LLSDACSB1421:
395  .uleb128 0x0
396  .uleb128 0x0
397  .uleb128 0x1
398  .uleb128 0x1
399 .LLSDACSE1421:
400  .byte 0x1
401  .byte 0x0
402  .align 4
403  .long _ZTIi
404 .LLSDATT1421:
405  .text
406  .section .gnu.linkonce.t._ZSt3minIjERKT_S2_S2_,"ax",@progbits
407  .align 2
408  .weak _ZSt3minIjERKT_S2_S2_
409  .type _ZSt3minIjERKT_S2_S2_, @function
430  .text
431  .align 2
454  .align 2
455  .type _GLOBAL__I_my_lsda, @function
456 _GLOBAL__I_my_lsda:
457  pushl %ebp
458  movl %esp, %ebp
459  subl $8, %esp
460  movl $65535, 4(%esp)
461  movl $1, (%esp)
462  call _Z41__static_initialization_and_destruction_0ii
463  leave
464  ret
465  .size _GLOBAL__I_my_lsda, .-_GLOBAL__I_my_lsda
466  .section .ctors,"aw",@progbits
467  .align 4
468  .long _GLOBAL__I_my_lsda
469  .text
470  .align 2
471  .type _GLOBAL__D_my_lsda, @function
472 _GLOBAL__D_my_lsda:
473  pushl %ebp
474  movl %esp, %ebp
475  subl $8, %esp
476  movl $65535, 4(%esp)
477  movl $0, (%esp)
478  call _Z41__static_initialization_and_destruction_0ii
479  leave
480  ret
481  .size _GLOBAL__D_my_lsda, .-_GLOBAL__D_my_lsda
482  .section .dtors,"aw",@progbits
483  .align 4
484  .long _GLOBAL__D_my_lsda
494  .section .note.GNU-stack,"",@progbits
495  .ident "GCC: (GNU) 3.4.4"

由於用到 typeinfo 來判斷型別, 這是為什麼 exception handle 需要有 rtti 支援的原因。

從 global object, static object, virtaul function, rtti 到 exception handle, 現在你知道 c++ 有那麼多的黑魔法, c++ 真是不簡單, 這也是為人所詬病的一個特性, 太黑箱了。

在 c++ 這麼多的特性, 我最有興趣的是 virtual function 和 exception handle 的實作, 我已經找了多年的資料, 有點收穫真是開心。

typeid ref:
  • typeid详解
  • 執行時期型態資訊(RTTI)
  • A General-Purpose Run-Time Type Information System for C++
  • http://www.cs.rug.nl/~alext/SOFTWARE/RTTI/rtti_doc.html
  • https://pdfs.semanticscholar.org/ca44/58d8cb126fe6eae8f19cab6efb9b9fe47c88.pdf
ref:
Visual C++ 的 exception handle:

dwarf: