Hello world: C, assembly, object file and executable
Summary: In this article we'll take a look at the C program that prints "Hello World!" to the screen, which we'll assemble and compile. Then we'll compare the results and try to present what's happening beneath the curtains. Specifically, we will look at which sections are present in the transformation chain: from C code, to assembly code, to object file, to executable.
FREE role-guided training plans
Hello world program: The assembly
First we need to write the hello world C program, which can be seen below:
[cpp]
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
[/cpp]
It's a very simple program that doesn't actually do anything; we intentionally kept it this simple, so we will be able to focus on the bigger picture and not tons of code. We then need to compile the program to obtain the assembly code – we don't want to do anything else right now. To do that we can use the -S option passed to the gcc program, which takes the source code of the program and generates the assembly instructions. We also want the masm Intel assembly source code and not some other format. We can achieve that by passing the -masm=Intel to the gcc program. If we're on the 64-bit operating system, we also want to compile the program as 32-bit, which we can achieve by passing the -m32 argument to the gcc program. The whole gcc command that we're using can be seen in the output below:
[bash]
# gcc -m32 -masm=intel -S hello.c -o hello.s
[/bash]
This command effectively takes the hello.c program and compiles it as 32-bit program into assembly instructions that are saved into the hello.s file.
The hello.s file now looks like presented below:
[plain]
.file "hello.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl main
.type main, @function
main:
push ebp
mov ebp, esp
and esp, -16
sub esp, 16
mov eax, OFFSET FLAT:.LC0
mov DWORD PTR [esp], eax
call printf
mov eax, 0
leave
ret
.size main, .-main
.ident "GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4"
.section .note.GNU-stack,"",@progbits
[/plain]
The .file directive states the original source file name that is normally used by debuggers. The .intel_syntax line specifies that we're using intel sytax assembly and not AT&T syntax. Afterwards we're defining the .rodata section, which is used for read-only data variables. In our case the .rodata section contains only the zero terminated string "Hello World!" that can be accessed with the LC0 variable. Then we're defining the .text section, which is used for the code of the program.
First we must define the main function (notice the .type main,@function instruction), which is globally visible (notice the .globl main instruction). From the main: label till the ret instruction is the actual code of the program. That code first initializes the stack by pushing the value of the register EBP to the stack, moving the value of register ESP to EBP. The "and esp,-16" is used for optimization because some operations can be performed faster if the stack pointer address is in a multiple of 16 bytes. That instruction is put in there because by default, gcc uses the optimization flag -O2. Then we're subtracting 16 bytes from the current ESP stack pointer register for local variables. Next, the address to the LC0 (our "Hello World!" string) is read into the register eax and moved to the top of the stack, which is the first and only parameter to the printf function that is called right after. The printf function prints that string on the screen and returns to the caller, which takes care of the stack and returns.
The .size instruction sets the size of the main function. The .-main holds the exact size of the function main, which is written to the object file. The .ident instruction saves the " GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4" string to the object file in order to save the information about the compiler which was used to compile the executable.
Hello world program: The object file
We've seen the assembly code that was generated by the gcc directly from the corresponding C source code. But without the actual assembler and linker we can't run the executable. To assemble the executable into the object file, we must use the -c option with the gcc compiler, which only assembles/compiles the source file, but does not actually link it. To obtain the object file from the assembly code we need to run the command below:
[bash]
# gcc -m32 -masm=intel -c hello.s -o hello.o
# file hello.o
hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
[/bash]
We can see that the hello.o is the object file that is actually an ELF 32-bit executable, which is not linked yet. If we want to run the executable, it will fail as noted below:
[bash]
# chmod +x hello.o
# ./hello.o
bash: ./hello.o: cannot execute binary file
[/bash]
We can read the contents of the object file with the readelf program as follows:
[plain]
# readelf -a hello.o
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 224 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 11
Section header string table index: 8
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 00001d 00 AX 0 0 4
[ 2] .rel.text REL 00000000 000350 000010 08 9 1 4
[ 3] .data PROGBITS 00000000 000054 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 000054 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 000054 00000d 00 A 0 0 1
[ 6] .comment PROGBITS 00000000 000061 00002b 01 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 00000000 00008c 000000 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 00008c 000051 00 0 0 1
[ 9] .symtab SYMTAB 00000000 000298 0000a0 10 10 8 4
[10] .strtab STRTAB 00000000 000338 000015 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rel.text' at offset 0x350 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0000000a 00000501 R_386_32 00000000 .rodata
00000012 00000902 R_386_PC32 00000000 printf
There are no unwind sections in this file.
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS hello.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 SECTION LOCAL DEFAULT 5
6: 00000000 0 SECTION LOCAL DEFAULT 7
7: 00000000 0 SECTION LOCAL DEFAULT 6
8: 00000000 29 FUNC GLOBAL DEFAULT 1 main
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
No version information found in this file.
[/plain]
We can see that the file is an ELF object file that has 11 section headers. The first section header is null. The second section header is .text, which contains the executable instructions of the program. The .rel.text holds the relocation information of the .text section. The relocation entries must be present, as our program instructions call external functions, whose function pointers must be updated upon the program execution.
In the output above, we can see that the .rel.text holds two relocation entries: the .rodata and printf. The .data section holds the initialized data, while the .bss section holds uninitialized data that the program uses. The .rodata holds read-only data that can be used by the program; this is where our "Hello World!" string is stored. The .comment section holds version control information and the .note.GNU-stack holds some additional data that I won't describe here. The .shstrtab holds section names, while the .strtab holds section strings and the .symtab holds the symbol table.
We can quickly figure out that in the assembly code there was only the .rodata and .text sections defined, but when we translated the assembly code into the object file, quite some sections were added to the file. Those sections are needed to successfully link the executable and properly execute the program.
Hello world program: The executable
The last step is to actually link the object file to make an executable. To do that, we must execute the command below:
[plain]
# gcc -m32 hello.o -o hello
# ./hello
Hello World!
[/plain]
We've linked the object file hello.o into the executable ./hello and executed it. Upon execution of the program, the program outputted the "Hello World!" string as it should. If we take a look at the ELF again, we can see that there is a lot of other information and file sections added to the executable, which can be seen below:
[plain]
$ readelf -a hello
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048330
Start of program headers: 52 (bytes into file)
Start of section headers: 4392 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 10
Size of section headers: 40 (bytes)
Number of section headers: 30
Section header string table index: 27
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048174 000174 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048188 000188 000020 00 A 0 0 4
[ 3] .hash HASH 080481a8 0001a8 000028 04 A 5 0 4
[ 4] .gnu.hash GNU_HASH 080481d0 0001d0 000020 04 A 5 0 4
[ 5] .dynsym DYNSYM 080481f0 0001f0 000050 10 A 6 1 4
[ 6] .dynstr STRTAB 08048240 000240 00004c 00 A 0 0 1
[ 7] .gnu.version VERSYM 0804828c 00028c 00000a 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 08048298 000298 000020 00 A 6 1 4
[ 9] .rel.dyn REL 080482b8 0002b8 000008 08 A 5 0 4
[10] .rel.plt REL 080482c0 0002c0 000018 08 A 5 12 4
[11] .init PROGBITS 080482d8 0002d8 000017 00 AX 0 0 4
[12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16
[13] .text PROGBITS 08048330 000330 00019c 00 AX 0 0 16
[14] .fini PROGBITS 080484cc 0004cc 00001c 00 AX 0 0 4
[15] .rodata PROGBITS 080484e8 0004e8 000015 00 A 0 0 4
[16] .eh_frame_hdr PROGBITS 08048500 000500 000014 00 A 0 0 4
[17] .eh_frame PROGBITS 08048514 000514 000040 00 A 0 0 4
[18] .ctors PROGBITS 08049f0c 000f0c 000008 00 WA 0 0 4
[19] .dtors PROGBITS 08049f14 000f14 000008 00 WA 0 0 4
[20] .jcr PROGBITS 08049f1c 000f1c 000004 00 WA 0 0 4
[21] .dynamic DYNAMIC 08049f20 000f20 0000d0 08 WA 6 0 4
[22] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4
[23] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4
[24] .data PROGBITS 0804a00c 00100c 000008 00 WA 0 0 4
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
[26] .comment PROGBITS 00000000 001014 00002a 01 MS 0 0 1
[27] .shstrtab STRTAB 00000000 00103e 0000e9 00 0 0 1
[28] .symtab SYMTAB 00000000 0015d8 000340 10 29 32 4
[29] .strtab STRTAB 00000000 001918 00014d 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4
INTERP 0x000174 0x08048174 0x08048174 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x00554 0x00554 R E 0x1000
LOAD 0x000f0c 0x08049f0c 0x08049f0c 0x00108 0x00110 RW 0x1000
DYNAMIC 0x000f20 0x08049f20 0x08049f20 0x000d0 0x000d0 RW 0x4
NOTE 0x000188 0x08048188 0x08048188 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x000500 0x08048500 0x08048500 0x00014 0x00014 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
GNU_RELRO 0x000f0c 0x08049f0c 0x08049f0c 0x000f4 0x000f4 R 0x1
PAX_FLAGS 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
08 .ctors .dtors .jcr .dynamic .got
09
Dynamic section at offset 0xf20 contains 21 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000c (INIT) 0x80482d8
0x0000000d (FINI) 0x80484cc
0x00000004 (HASH) 0x80481a8
0x6ffffef5 (GNU_HASH) 0x80481d0
0x00000005 (STRTAB) 0x8048240
0x00000006 (SYMTAB) 0x80481f0
0x0000000a (STRSZ) 76 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x00000015 (DEBUG) 0x0
0x00000003 (PLTGOT) 0x8049ff4
0x00000002 (PLTRELSZ) 24 (bytes)
0x00000014 (PLTREL) REL
0x00000017 (JMPREL) 0x80482c0
0x00000011 (REL) 0x80482b8
0x00000012 (RELSZ) 8 (bytes)
0x00000013 (RELENT) 8 (bytes)
0x6ffffffe (VERNEED) 0x8048298
0x6fffffff (VERNEEDNUM) 1
0x6ffffff0 (VERSYM) 0x804828c
0x00000000 (NULL) 0x0
Relocation section '.rel.dyn' at offset 0x2b8 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
08049ff0 00000206 R_386_GLOB_DAT 00000000 __gmon_start__
Relocation section '.rel.plt' at offset 0x2c0 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
0804a000 00000107 R_386_JUMP_SLOT 00000000 printf
0804a004 00000207 R_386_JUMP_SLOT 00000000 __gmon_start__
0804a008 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main
There are no unwind sections in this file.
Symbol table '.dynsym' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.0 (2)
2: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (2)
4: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
Symbol table '.symtab' contains 52 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048174 0 SECTION LOCAL DEFAULT 1
2: 08048188 0 SECTION LOCAL DEFAULT 2
3: 080481a8 0 SECTION LOCAL DEFAULT 3
4: 080481d0 0 SECTION LOCAL DEFAULT 4
5: 080481f0 0 SECTION LOCAL DEFAULT 5
6: 08048240 0 SECTION LOCAL DEFAULT 6
7: 0804828c 0 SECTION LOCAL DEFAULT 7
8: 08048298 0 SECTION LOCAL DEFAULT 8
9: 080482b8 0 SECTION LOCAL DEFAULT 9
10: 080482c0 0 SECTION LOCAL DEFAULT 10
11: 080482d8 0 SECTION LOCAL DEFAULT 11
12: 080482f0 0 SECTION LOCAL DEFAULT 12
13: 08048330 0 SECTION LOCAL DEFAULT 13
14: 080484cc 0 SECTION LOCAL DEFAULT 14
15: 080484e8 0 SECTION LOCAL DEFAULT 15
16: 08048500 0 SECTION LOCAL DEFAULT 16
17: 08048514 0 SECTION LOCAL DEFAULT 17
18: 08049f0c 0 SECTION LOCAL DEFAULT 18
19: 08049f14 0 SECTION LOCAL DEFAULT 19
20: 08049f1c 0 SECTION LOCAL DEFAULT 20
21: 08049f20 0 SECTION LOCAL DEFAULT 21
22: 08049ff0 0 SECTION LOCAL DEFAULT 22
23: 08049ff4 0 SECTION LOCAL DEFAULT 23
24: 0804a00c 0 SECTION LOCAL DEFAULT 24
25: 0804a014 0 SECTION LOCAL DEFAULT 25
26: 00000000 0 SECTION LOCAL DEFAULT 26
27: 00000000 0 FILE LOCAL DEFAULT ABS hello.c
28: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_end
29: 08049f20 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
30: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_start
31: 08049ff4 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE_
32: 08048490 5 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
33: 08048495 0 FUNC GLOBAL HIDDEN 13 __i686.get_pc_thunk.bx
34: 0804a00c 0 NOTYPE WEAK DEFAULT 24 data_start
35: 00000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.0
36: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS _edata
37: 080484cc 0 FUNC GLOBAL DEFAULT 14 _fini
38: 08049f18 0 OBJECT GLOBAL HIDDEN 19 __DTOR_END__
39: 0804a00c 0 NOTYPE GLOBAL DEFAULT 24 __data_start
40: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
41: 0804a010 0 OBJECT GLOBAL HIDDEN 24 __dso_handle
42: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
43: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
44: 08048430 90 FUNC GLOBAL DEFAULT 13 __libc_csu_init
45: 0804a01c 0 NOTYPE GLOBAL DEFAULT ABS _end
46: 08048330 0 FUNC GLOBAL DEFAULT 13 _start
47: 080484e8 4 OBJECT GLOBAL DEFAULT 15 _fp_hw
48: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
49: 08048404 29 FUNC GLOBAL DEFAULT 13 main
50: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
51: 080482d8 0 FUNC GLOBAL DEFAULT 11 _init
Histogram for bucket list length (total of 3 buckets):
Length Number % of total Coverage
0 0 ( 0.0%)
1 2 ( 66.7%) 50.0%
2 1 ( 33.3%) 100.0%
Histogram for `.gnu.hash' bucket list length (total of 2 buckets):
Length Number % of total Coverage
0 1 ( 50.0%)
1 1 ( 50.0%) 100.0%
Version symbols section '.gnu.version' contains 5 entries:
Addr: 000000000804828c Offset: 0x00028c Link: 5 (.dynsym)
000: 0 (*local*) 2 (GLIBC_2.0) 0 (*local*) 2 (GLIBC_2.0)
004: 1 (*global*)
Version needs section '.gnu.version_r' contains 1 entries:
Addr: 0x0000000008048298 Offset: 0x000298 Link: 6 (.dynstr)
000000: Version: 1 File: libc.so.6 Cnt: 1
0x0010: Name: GLIBC_2.0 Flags: none Version: 2
Notes at offset 0x00000188 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.9
[/plain]
FREE role-guided training plans
Conclusion
We've now seen how a simple program written in C is converted into the assembly code, the object file and finally the executable file. While in the C code, the program didn't have any sections, it had two sections in assembly dialect: the .rodata and .text. When we compiled it into an object file and finally into the executable, the file had more and more sections that are needed for the program to be executed successfully.