Linux中誰來呼叫C語言中的main?

記得很久以前聽說在Linux執行檔案時,真正的起始點並不是main,加上之前有看到單純ld會幫你偷偷link一些沒看過的object檔案,所以這次就來看到底真相為何?

測試環境

因為很假掰想要順便接觸一下ARM的組語,所以這次測試就使用Qemu跑ARM的Debian。

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.0 (jessie)
Release:	8.0
Codename:	jessie

$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=571db48d9c9e4625b7da206e748e41c237f2b202, stripped

測試原始碼,一樣是大家熟悉的Hellow world

#include <stdio.h>

int main()
{
	printf("Hello World\n");

	return 0;
}

不知道各位還記得前面有提過,執行檔中有.text的section。要執行的機械碼會放在這邊。我們先來看看hello1執行檔會從那邊開始?

$ readelf -h hello1
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x102f0
  Start of program headers:          52 (bytes into file)
...
Section header string table index: 33

從readelf可以看到起始點為0x102f0,那麼0x102f0是在那邊呢?我們再去看symbol table可以看到很巧的就是.text的起始點。

$ objdump -t hello1

hello1:     file format elf32-littlearm

SYMBOL TABLE:
00010134 l    d  .interp	00000000              .interp
...
000102f0 l    d  .text	00000000              .text

好了,那麼.text這邊起始的程式是什麼?

Disassembly of section .text:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

很有趣,沒看到main(),反而看到_start。到底是_start是什麼呢?還記得Linker script嗎?裡面有一個ENTRY指令,可以指定程式從那邊開始跑,先來看一下預設的ENTRY是不是也是_start?

$ ld --verbose | grep ENTRY
ENTRY(_start)

目前我們只知道執行檔起始點是_start,而不是main,那顯然有人幫你把執行檔加碼,以至於你的執行檔出現了_start。最偷懶的方式就是去找binary看看是不是有這樣的symbol。

user@host:/usr/lib$ find -name "*.[ao]" -exec nm -A {} \;  2> /dev/null | grep " _start$"
./arm-linux-gnueabi/crt1.o:00000000 T _start
./arm-linux-gnueabi/gcrt1.o:00000000 T _start
./arm-linux-gnueabi/Scrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/crt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/gcrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/Scrt1.o:00000000 T _start

OK,的確有object檔案裡面有_start,我們再來確認編譯的時候會不會link這些檔案。

$ gcc -v hello1.c
Using built-in specs.
COLLECT_GCC=gcc
...
COLLECT_GCC_OPTIONS='-v' '-march=armv4t' '-mfloat-abi=soft'
...
-X --hash-style=gnu -m armelf_linux_eabi
...
/usr/lib/gcc/arm-linux-gnueabi/4.9/../../../arm-linux-gnueabi/crt1.o
...

_start會呼叫外部函數__libc_start_main,我們透過LD_DEBUG來看一下。

$ LD_DEBUG=all ./hello1 2>&1 |grep __libc_start_main
       890:	symbol=__libc_start_main;  lookup in file=./hello1 [0]
       890:	symbol=__libc_start_main;  lookup in file=/lib/arm-linux-gnueabi/libc.so.6 [0]
       890:	binding file ./hello1 [0] to /lib/arm-linux-gnueabi/libc.so.6 [0]: normal symbol `__libc_start_main' [GLIBC_2.4]

可以看到,在./hello1中有去找__libc_start_main,最後去libc.so.6找,並且找出libc.so.6__libc_start_main的位址(即binding)。而__libc_start_mainprototype如下

int __libc_start_main(int (*main) (int, char **, char **), int argc, char ** ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (*stack_end));

看到有趣的東西嘛?我有看到

  • main函數當作function pointer傳入
  • main函數的參數
  • 其他不知道三小的function pointer
    • init
    • fini
    • rtld_fini

從這邊我可以猜測這個函數就是呼叫一堆callback function,這些callback function就是上面列的死人骨頭。

手冊的說明可以看到__libc_start_main()是用來執行環境的初始化、呼叫main函數並且傳遞參數,當main函數結束後處理回傳值。手冊提到的範例詳細行為有

  • 檢查權限,確保安全性
  • thread subsystem初始化 (我可不知道什麼thread subsystem唷)
  • rtld_fini註冊release callback function,當shared object結束時使用該callback釋放資源
  • 呼叫init callback function
  • 呼叫main callback function並且帶入參數
  • 當main callback function結束後,將回傳值作為參數呼叫exit

我們再回頭看看_start的組合語言:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

有趣的地方是這3個位址

   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

這邊可以看到這3個位址分別是

  • 10320: 000104b4 .word 0x000104b4
    • __libc_csu_fini
  • 10324: 00010420 .word 0x00010420
    • main
  • 10328: 00010448 .word 0x00010448
    • __libc_csu_init

也就是說,main__libc_csu_init分別當作第一和第四參數傳給__libc_start_main,而__libc_csu_fini則被丟到stack,一樣傳給__libc_start_main了。

結論

Linux執行程式的起始點並不是main,而是glibc binary中crt1.o準備的_start。這個start主要將你的main,還有一些hook函數丟給__libc_start_main,接下來libc的__libc_start_main樵好事情後才真正執行你的main,並且還要在main結束後清理戰場。

延伸閱讀

參考資料

完整反組譯程式碼

$ cat hello1.dis

hello1:     file format elf32-littlearm


Disassembly of section .init:

0001029c <_init>:
   1029c:	e92d4008 	push	{r3, lr}
   102a0:	eb000021 	bl	1032c <call_weak_fn>
   102a4:	e8bd4008 	pop	{r3, lr}
   102a8:	e12fff1e 	bx	lr

Disassembly of section .plt:

000102ac <puts@plt-0x14>:
   102ac:	e52de004 	push	{lr}		; (str lr, [sp, #-4]!)
   102b0:	e59fe004 	ldr	lr, [pc, #4]	; 102bc <_init+0x20>
   102b4:	e08fe00e 	add	lr, pc, lr
   102b8:	e5bef008 	ldr	pc, [lr, #8]!
   102bc:	00010318 	.word	0x00010318

000102c0 <puts@plt>:
   102c0:	e28fc600 	add	ip, pc, #0, 12
   102c4:	e28cca10 	add	ip, ip, #16, 20	; 0x10000
   102c8:	e5bcf318 	ldr	pc, [ip, #792]!	; 0x318

000102cc <__libc_start_main@plt>:
   102cc:	e28fc600 	add	ip, pc, #0, 12
   102d0:	e28cca10 	add	ip, ip, #16, 20	; 0x10000
   102d4:	e5bcf310 	ldr	pc, [ip, #784]!	; 0x310

000102d8 <__gmon_start__@plt>:
   102d8:	e28fc600 	add	ip, pc, #0, 12
   102dc:	e28cca10 	add	ip, ip, #16, 20	; 0x10000
   102e0:	e5bcf308 	ldr	pc, [ip, #776]!	; 0x308

000102e4 <abort@plt>:
   102e4:	e28fc600 	add	ip, pc, #0, 12
   102e8:	e28cca10 	add	ip, ip, #16, 20	; 0x10000
   102ec:	e5bcf300 	ldr	pc, [ip, #768]!	; 0x300

Disassembly of section .text:

000102f0 <_start>:
   102f0:	e3a0b000 	mov	fp, #0
   102f4:	e3a0e000 	mov	lr, #0
   102f8:	e49d1004 	pop	{r1}		; (ldr r1, [sp], #4)
   102fc:	e1a0200d 	mov	r2, sp
   10300:	e52d2004 	push	{r2}		; (str r2, [sp, #-4]!)
   10304:	e52d0004 	push	{r0}		; (str r0, [sp, #-4]!)
   10308:	e59fc010 	ldr	ip, [pc, #16]	; 10320 <_start+0x30>
   1030c:	e52dc004 	push	{ip}		; (str ip, [sp, #-4]!)
   10310:	e59f000c 	ldr	r0, [pc, #12]	; 10324 <_start+0x34>
   10314:	e59f300c 	ldr	r3, [pc, #12]	; 10328 <_start+0x38>
   10318:	ebffffeb 	bl	102cc <__libc_start_main@plt>
   1031c:	ebfffff0 	bl	102e4 <abort@plt>
   10320:	000104b4 	.word	0x000104b4
   10324:	00010420 	.word	0x00010420
   10328:	00010448 	.word	0x00010448

0001032c <call_weak_fn>:
   1032c:	e59f3014 	ldr	r3, [pc, #20]	; 10348 <call_weak_fn+0x1c>
   10330:	e59f2014 	ldr	r2, [pc, #20]	; 1034c <call_weak_fn+0x20>
   10334:	e08f3003 	add	r3, pc, r3
   10338:	e7932002 	ldr	r2, [r3, r2]
   1033c:	e3520000 	cmp	r2, #0
   10340:	012fff1e 	bxeq	lr
   10344:	eaffffe3 	b	102d8 <__gmon_start__@plt>
   10348:	00010298 	.word	0x00010298
   1034c:	0000001c 	.word	0x0000001c

00010350 <deregister_tm_clones>:
   10350:	e59f301c 	ldr	r3, [pc, #28]	; 10374 <deregister_tm_clones+0x24>
   10354:	e59f001c 	ldr	r0, [pc, #28]	; 10378 <deregister_tm_clones+0x28>
   10358:	e0603003 	rsb	r3, r0, r3
   1035c:	e3530006 	cmp	r3, #6
   10360:	912fff1e 	bxls	lr
   10364:	e59f3010 	ldr	r3, [pc, #16]	; 1037c <deregister_tm_clones+0x2c>
   10368:	e3530000 	cmp	r3, #0
   1036c:	012fff1e 	bxeq	lr
   10370:	e12fff13 	bx	r3
   10374:	000205ff 	.word	0x000205ff
   10378:	000205fc 	.word	0x000205fc
   1037c:	00000000 	.word	0x00000000

00010380 <register_tm_clones>:
   10380:	e59f1024 	ldr	r1, [pc, #36]	; 103ac <register_tm_clones+0x2c>
   10384:	e59f0024 	ldr	r0, [pc, #36]	; 103b0 <register_tm_clones+0x30>
   10388:	e0601001 	rsb	r1, r0, r1
   1038c:	e1a01141 	asr	r1, r1, #2
   10390:	e0811fa1 	add	r1, r1, r1, lsr #31
   10394:	e1b010c1 	asrs	r1, r1, #1
   10398:	012fff1e 	bxeq	lr
   1039c:	e59f3010 	ldr	r3, [pc, #16]	; 103b4 <register_tm_clones+0x34>
   103a0:	e3530000 	cmp	r3, #0
   103a4:	012fff1e 	bxeq	lr
   103a8:	e12fff13 	bx	r3
   103ac:	000205fc 	.word	0x000205fc
   103b0:	000205fc 	.word	0x000205fc
   103b4:	00000000 	.word	0x00000000

000103b8 <__do_global_dtors_aux>:
   103b8:	e92d4010 	push	{r4, lr}
   103bc:	e59f401c 	ldr	r4, [pc, #28]	; 103e0 <__do_global_dtors_aux+0x28>
   103c0:	e5d43000 	ldrb	r3, [r4]
   103c4:	e3530000 	cmp	r3, #0
   103c8:	1a000002 	bne	103d8 <__do_global_dtors_aux+0x20>
   103cc:	ebffffdf 	bl	10350 <deregister_tm_clones>
   103d0:	e3a03001 	mov	r3, #1
   103d4:	e5c43000 	strb	r3, [r4]
   103d8:	e8bd4010 	pop	{r4, lr}
   103dc:	e12fff1e 	bx	lr
   103e0:	000205fc 	.word	0x000205fc

000103e4 <frame_dummy>:
   103e4:	e92d4008 	push	{r3, lr}
   103e8:	e59f0028 	ldr	r0, [pc, #40]	; 10418 <frame_dummy+0x34>
   103ec:	e5903000 	ldr	r3, [r0]
   103f0:	e3530000 	cmp	r3, #0
   103f4:	1a000001 	bne	10400 <frame_dummy+0x1c>
   103f8:	e8bd4008 	pop	{r3, lr}
   103fc:	eaffffdf 	b	10380 <register_tm_clones>
   10400:	e59f3014 	ldr	r3, [pc, #20]	; 1041c <frame_dummy+0x38>
   10404:	e3530000 	cmp	r3, #0
   10408:	0afffffa 	beq	103f8 <frame_dummy+0x14>
   1040c:	e1a0e00f 	mov	lr, pc
   10410:	e12fff13 	bx	r3
   10414:	eafffff7 	b	103f8 <frame_dummy+0x14>
   10418:	000204e8 	.word	0x000204e8
   1041c:	00000000 	.word	0x00000000

00010420 <main>:
   10420:	e92d4800 	push	{fp, lr}
   10424:	e28db004 	add	fp, sp, #4
   10428:	e59f0014 	ldr	r0, [pc, #20]	; 10444 <main+0x24>
   1042c:	ebffffa3 	bl	102c0 <puts@plt>
   10430:	e3a03000 	mov	r3, #0
   10434:	e1a00003 	mov	r0, r3
   10438:	e24bd004 	sub	sp, fp, #4
   1043c:	e8bd4800 	pop	{fp, lr}
   10440:	e12fff1e 	bx	lr
   10444:	000104c8 	.word	0x000104c8

00010448 <__libc_csu_init>:
   10448:	e92d43f8 	push	{r3, r4, r5, r6, r7, r8, r9, lr}
   1044c:	e59f6058 	ldr	r6, [pc, #88]	; 104ac <__libc_csu_init+0x64>
   10450:	e59f5058 	ldr	r5, [pc, #88]	; 104b0 <__libc_csu_init+0x68>
   10454:	e08f6006 	add	r6, pc, r6
   10458:	e08f5005 	add	r5, pc, r5
   1045c:	e0656006 	rsb	r6, r5, r6
   10460:	e1a07000 	mov	r7, r0
   10464:	e1a08001 	mov	r8, r1
   10468:	e1a09002 	mov	r9, r2
   1046c:	ebffff8a 	bl	1029c <_init>
   10470:	e1b06146 	asrs	r6, r6, #2
   10474:	0a00000a 	beq	104a4 <__libc_csu_init+0x5c>
   10478:	e2455004 	sub	r5, r5, #4
   1047c:	e3a04000 	mov	r4, #0
   10480:	e2844001 	add	r4, r4, #1
   10484:	e5b53004 	ldr	r3, [r5, #4]!
   10488:	e1a00007 	mov	r0, r7
   1048c:	e1a01008 	mov	r1, r8
   10490:	e1a02009 	mov	r2, r9
   10494:	e1a0e00f 	mov	lr, pc
   10498:	e12fff13 	bx	r3
   1049c:	e1540006 	cmp	r4, r6
   104a0:	1afffff6 	bne	10480 <__libc_csu_init+0x38>
   104a4:	e8bd43f8 	pop	{r3, r4, r5, r6, r7, r8, r9, lr}
   104a8:	e12fff1e 	bx	lr
   104ac:	00010088 	.word	0x00010088
   104b0:	00010080 	.word	0x00010080

000104b4 <__libc_csu_fini>:
   104b4:	e12fff1e 	bx	lr

Disassembly of section .fini:

000104b8 <_fini>:
   104b8:	e92d4008 	push	{r3, lr}
   104bc:	e8bd4008 	pop	{r3, lr}
   104c0:	e12fff1e 	bx	lr


书籍推荐