Hacking

Hello world: C, assembly, object file and executable

Dejan Lukan
January 7, 2013 by
Dejan Lukan

Summary: In this article we'll take a look at the C program that prints "Hello World!" to the screen, which we'll assemble and compile. Then we'll compare the results and try to present what's happening beneath the curtains. Specifically, we will look at which sections are present in the transformation chain: from C code, to assembly code, to object file, to executable.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

Hello world program: The assembly

First we need to write the hello world C program, which can be seen below:

[cpp]

#include <stdio.h>

int main() {

printf("Hello World!");

return 0;

}

[/cpp]

It's a very simple program that doesn't actually do anything; we intentionally kept it this simple, so we will be able to focus on the bigger picture and not tons of code. We then need to compile the program to obtain the assembly code – we don't want to do anything else right now. To do that we can use the -S option passed to the gcc program, which takes the source code of the program and generates the assembly instructions. We also want the masm Intel assembly source code and not some other format. We can achieve that by passing the -masm=Intel to the gcc program. If we're on the 64-bit operating system, we also want to compile the program as 32-bit, which we can achieve by passing the -m32 argument to the gcc program. The whole gcc command that we're using can be seen in the output below:

[bash]

# gcc -m32 -masm=intel -S hello.c -o hello.s

[/bash]

This command effectively takes the hello.c program and compiles it as 32-bit program into assembly instructions that are saved into the hello.s file.

The hello.s file now looks like presented below:

[plain]

.file "hello.c"

.intel_syntax noprefix

.section .rodata

.LC0:

.string "Hello World!"

.text

.globl main

.type main, @function

main:

push ebp

mov ebp, esp

and esp, -16

sub esp, 16

mov eax, OFFSET FLAT:.LC0

mov DWORD PTR [esp], eax

call printf

mov eax, 0

leave

ret

.size main, .-main

.ident "GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4"

.section .note.GNU-stack,"",@progbits

[/plain]

The .file directive states the original source file name that is normally used by debuggers. The .intel_syntax line specifies that we're using intel sytax assembly and not AT&T syntax. Afterwards we're defining the .rodata section, which is used for read-only data variables. In our case the .rodata section contains only the zero terminated string "Hello World!" that can be accessed with the LC0 variable. Then we're defining the .text section, which is used for the code of the program.

First we must define the main function (notice the .type main,@function instruction), which is globally visible (notice the .globl main instruction). From the main: label till the ret instruction is the actual code of the program. That code first initializes the stack by pushing the value of the register EBP to the stack, moving the value of register ESP to EBP. The "and esp,-16" is used for optimization because some operations can be performed faster if the stack pointer address is in a multiple of 16 bytes. That instruction is put in there because by default, gcc uses the optimization flag -O2. Then we're subtracting 16 bytes from the current ESP stack pointer register for local variables. Next, the address to the LC0 (our "Hello World!" string) is read into the register eax and moved to the top of the stack, which is the first and only parameter to the printf function that is called right after. The printf function prints that string on the screen and returns to the caller, which takes care of the stack and returns.

The .size instruction sets the size of the main function. The .-main holds the exact size of the function main, which is written to the object file. The .ident instruction saves the " GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4" string to the object file in order to save the information about the compiler which was used to compile the executable.

Hello world program: The object file

We've seen the assembly code that was generated by the gcc directly from the corresponding C source code. But without the actual assembler and linker we can't run the executable. To assemble the executable into the object file, we must use the -c option with the gcc compiler, which only assembles/compiles the source file, but does not actually link it. To obtain the object file from the assembly code we need to run the command below:

[bash]

# gcc -m32 -masm=intel -c hello.s -o hello.o

# file hello.o

hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

[/bash]

We can see that the hello.o is the object file that is actually an ELF 32-bit executable, which is not linked yet. If we want to run the executable, it will fail as noted below:

[bash]

# chmod +x hello.o

# ./hello.o

bash: ./hello.o: cannot execute binary file

[/bash]

We can read the contents of the object file with the readelf program as follows:

[plain]

# readelf -a hello.o

ELF Header:

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Class: ELF32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: REL (Relocatable file)

Machine: Intel 80386

Version: 0x1

Entry point address: 0x0

Start of program headers: 0 (bytes into file)

Start of section headers: 224 (bytes into file)

Flags: 0x0

Size of this header: 52 (bytes)

Size of program headers: 0 (bytes)

Number of program headers: 0

Size of section headers: 40 (bytes)

Number of section headers: 11

Section header string table index: 8

Section Headers:

[Nr] Name Type Addr Off Size ES Flg Lk Inf Al

[ 0] NULL 00000000 000000 000000 00 0 0 0

[ 1] .text PROGBITS 00000000 000034 00001d 00 AX 0 0 4

[ 2] .rel.text REL 00000000 000350 000010 08 9 1 4

[ 3] .data PROGBITS 00000000 000054 000000 00 WA 0 0 4

[ 4] .bss NOBITS 00000000 000054 000000 00 WA 0 0 4

[ 5] .rodata PROGBITS 00000000 000054 00000d 00 A 0 0 1

[ 6] .comment PROGBITS 00000000 000061 00002b 01 MS 0 0 1

[ 7] .note.GNU-stack PROGBITS 00000000 00008c 000000 00 0 0 1

[ 8] .shstrtab STRTAB 00000000 00008c 000051 00 0 0 1

[ 9] .symtab SYMTAB 00000000 000298 0000a0 10 10 8 4

[10] .strtab STRTAB 00000000 000338 000015 00 0 0 1

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

There are no program headers in this file.

Relocation section '.rel.text' at offset 0x350 contains 2 entries:

Offset Info Type Sym.Value Sym. Name

0000000a 00000501 R_386_32 00000000 .rodata

00000012 00000902 R_386_PC32 00000000 printf

There are no unwind sections in this file.

Symbol table '.symtab' contains 10 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 00000000 0 FILE LOCAL DEFAULT ABS hello.c

2: 00000000 0 SECTION LOCAL DEFAULT 1

3: 00000000 0 SECTION LOCAL DEFAULT 3

4: 00000000 0 SECTION LOCAL DEFAULT 4

5: 00000000 0 SECTION LOCAL DEFAULT 5

6: 00000000 0 SECTION LOCAL DEFAULT 7

7: 00000000 0 SECTION LOCAL DEFAULT 6

8: 00000000 29 FUNC GLOBAL DEFAULT 1 main

9: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf

No version information found in this file.

[/plain]

We can see that the file is an ELF object file that has 11 section headers. The first section header is null. The second section header is .text, which contains the executable instructions of the program. The .rel.text holds the relocation information of the .text section. The relocation entries must be present, as our program instructions call external functions, whose function pointers must be updated upon the program execution.

In the output above, we can see that the .rel.text holds two relocation entries: the .rodata and printf. The .data section holds the initialized data, while the .bss section holds uninitialized data that the program uses. The .rodata holds read-only data that can be used by the program; this is where our "Hello World!" string is stored. The .comment section holds version control information and the .note.GNU-stack holds some additional data that I won't describe here. The .shstrtab holds section names, while the .strtab holds section strings and the .symtab holds the symbol table.

We can quickly figure out that in the assembly code there was only the .rodata and .text sections defined, but when we translated the assembly code into the object file, quite some sections were added to the file. Those sections are needed to successfully link the executable and properly execute the program.

Hello world program: The executable

The last step is to actually link the object file to make an executable. To do that, we must execute the command below:

[plain]

# gcc -m32 hello.o -o hello

# ./hello

Hello World!

[/plain]

We've linked the object file hello.o into the executable ./hello and executed it. Upon execution of the program, the program outputted the "Hello World!" string as it should. If we take a look at the ELF again, we can see that there is a lot of other information and file sections added to the executable, which can be seen below:

[plain]

$ readelf -a hello

ELF Header:

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Class: ELF32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: UNIX - System V

ABI Version: 0

Type: EXEC (Executable file)

Machine: Intel 80386

Version: 0x1

Entry point address: 0x8048330

Start of program headers: 52 (bytes into file)

Start of section headers: 4392 (bytes into file)

Flags: 0x0

Size of this header: 52 (bytes)

Size of program headers: 32 (bytes)

Number of program headers: 10

Size of section headers: 40 (bytes)

Number of section headers: 30

Section header string table index: 27

Section Headers:

[Nr] Name Type Addr Off Size ES Flg Lk Inf Al

[ 0] NULL 00000000 000000 000000 00 0 0 0

[ 1] .interp PROGBITS 08048174 000174 000013 00 A 0 0 1

[ 2] .note.ABI-tag NOTE 08048188 000188 000020 00 A 0 0 4

[ 3] .hash HASH 080481a8 0001a8 000028 04 A 5 0 4

[ 4] .gnu.hash GNU_HASH 080481d0 0001d0 000020 04 A 5 0 4

[ 5] .dynsym DYNSYM 080481f0 0001f0 000050 10 A 6 1 4

[ 6] .dynstr STRTAB 08048240 000240 00004c 00 A 0 0 1

[ 7] .gnu.version VERSYM 0804828c 00028c 00000a 02 A 5 0 2

[ 8] .gnu.version_r VERNEED 08048298 000298 000020 00 A 6 1 4

[ 9] .rel.dyn REL 080482b8 0002b8 000008 08 A 5 0 4

[10] .rel.plt REL 080482c0 0002c0 000018 08 A 5 12 4

[11] .init PROGBITS 080482d8 0002d8 000017 00 AX 0 0 4

[12] .plt PROGBITS 080482f0 0002f0 000040 04 AX 0 0 16

[13] .text PROGBITS 08048330 000330 00019c 00 AX 0 0 16

[14] .fini PROGBITS 080484cc 0004cc 00001c 00 AX 0 0 4

[15] .rodata PROGBITS 080484e8 0004e8 000015 00 A 0 0 4

[16] .eh_frame_hdr PROGBITS 08048500 000500 000014 00 A 0 0 4

[17] .eh_frame PROGBITS 08048514 000514 000040 00 A 0 0 4

[18] .ctors PROGBITS 08049f0c 000f0c 000008 00 WA 0 0 4

[19] .dtors PROGBITS 08049f14 000f14 000008 00 WA 0 0 4

[20] .jcr PROGBITS 08049f1c 000f1c 000004 00 WA 0 0 4

[21] .dynamic DYNAMIC 08049f20 000f20 0000d0 08 WA 6 0 4

[22] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4

[23] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4

[24] .data PROGBITS 0804a00c 00100c 000008 00 WA 0 0 4

[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4

[26] .comment PROGBITS 00000000 001014 00002a 01 MS 0 0 1

[27] .shstrtab STRTAB 00000000 00103e 0000e9 00 0 0 1

[28] .symtab SYMTAB 00000000 0015d8 000340 10 29 32 4

[29] .strtab STRTAB 00000000 001918 00014d 00 0 0 1

Key to Flags:

W (write), A (alloc), X (execute), M (merge), S (strings)

I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)

O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4

INTERP 0x000174 0x08048174 0x08048174 0x00013 0x00013 R 0x1

[Requesting program interpreter: /lib/ld-linux.so.2]

LOAD 0x000000 0x08048000 0x08048000 0x00554 0x00554 R E 0x1000

LOAD 0x000f0c 0x08049f0c 0x08049f0c 0x00108 0x00110 RW 0x1000

DYNAMIC 0x000f20 0x08049f20 0x08049f20 0x000d0 0x000d0 RW 0x4

NOTE 0x000188 0x08048188 0x08048188 0x00020 0x00020 R 0x4

GNU_EH_FRAME 0x000500 0x08048500 0x08048500 0x00014 0x00014 R 0x4

GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

GNU_RELRO 0x000f0c 0x08049f0c 0x08049f0c 0x000f4 0x000f4 R 0x1

PAX_FLAGS 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4

Section to Segment mapping:

Segment Sections...

00

01 .interp

02 .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame

03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss

04 .dynamic

05 .note.ABI-tag

06 .eh_frame_hdr

07

08 .ctors .dtors .jcr .dynamic .got

09

Dynamic section at offset 0xf20 contains 21 entries:

Tag Type Name/Value

0x00000001 (NEEDED) Shared library: [libc.so.6]

0x0000000c (INIT) 0x80482d8

0x0000000d (FINI) 0x80484cc

0x00000004 (HASH) 0x80481a8

0x6ffffef5 (GNU_HASH) 0x80481d0

0x00000005 (STRTAB) 0x8048240

0x00000006 (SYMTAB) 0x80481f0

0x0000000a (STRSZ) 76 (bytes)

0x0000000b (SYMENT) 16 (bytes)

0x00000015 (DEBUG) 0x0

0x00000003 (PLTGOT) 0x8049ff4

0x00000002 (PLTRELSZ) 24 (bytes)

0x00000014 (PLTREL) REL

0x00000017 (JMPREL) 0x80482c0

0x00000011 (REL) 0x80482b8

0x00000012 (RELSZ) 8 (bytes)

0x00000013 (RELENT) 8 (bytes)

0x6ffffffe (VERNEED) 0x8048298

0x6fffffff (VERNEEDNUM) 1

0x6ffffff0 (VERSYM) 0x804828c

0x00000000 (NULL) 0x0

Relocation section '.rel.dyn' at offset 0x2b8 contains 1 entries:

Offset Info Type Sym.Value Sym. Name

08049ff0 00000206 R_386_GLOB_DAT 00000000 __gmon_start__

Relocation section '.rel.plt' at offset 0x2c0 contains 3 entries:

Offset Info Type Sym.Value Sym. Name

0804a000 00000107 R_386_JUMP_SLOT 00000000 printf

0804a004 00000207 R_386_JUMP_SLOT 00000000 __gmon_start__

0804a008 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main

There are no unwind sections in this file.

Symbol table '.dynsym' contains 5 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 00000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.0 (2)

2: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__

3: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.0 (2)

4: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used

Symbol table '.symtab' contains 52 entries:

Num: Value Size Type Bind Vis Ndx Name

0: 00000000 0 NOTYPE LOCAL DEFAULT UND

1: 08048174 0 SECTION LOCAL DEFAULT 1

2: 08048188 0 SECTION LOCAL DEFAULT 2

3: 080481a8 0 SECTION LOCAL DEFAULT 3

4: 080481d0 0 SECTION LOCAL DEFAULT 4

5: 080481f0 0 SECTION LOCAL DEFAULT 5

6: 08048240 0 SECTION LOCAL DEFAULT 6

7: 0804828c 0 SECTION LOCAL DEFAULT 7

8: 08048298 0 SECTION LOCAL DEFAULT 8

9: 080482b8 0 SECTION LOCAL DEFAULT 9

10: 080482c0 0 SECTION LOCAL DEFAULT 10

11: 080482d8 0 SECTION LOCAL DEFAULT 11

12: 080482f0 0 SECTION LOCAL DEFAULT 12

13: 08048330 0 SECTION LOCAL DEFAULT 13

14: 080484cc 0 SECTION LOCAL DEFAULT 14

15: 080484e8 0 SECTION LOCAL DEFAULT 15

16: 08048500 0 SECTION LOCAL DEFAULT 16

17: 08048514 0 SECTION LOCAL DEFAULT 17

18: 08049f0c 0 SECTION LOCAL DEFAULT 18

19: 08049f14 0 SECTION LOCAL DEFAULT 19

20: 08049f1c 0 SECTION LOCAL DEFAULT 20

21: 08049f20 0 SECTION LOCAL DEFAULT 21

22: 08049ff0 0 SECTION LOCAL DEFAULT 22

23: 08049ff4 0 SECTION LOCAL DEFAULT 23

24: 0804a00c 0 SECTION LOCAL DEFAULT 24

25: 0804a014 0 SECTION LOCAL DEFAULT 25

26: 00000000 0 SECTION LOCAL DEFAULT 26

27: 00000000 0 FILE LOCAL DEFAULT ABS hello.c

28: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_end

29: 08049f20 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC

30: 08049f0c 0 NOTYPE LOCAL DEFAULT 18 __init_array_start

31: 08049ff4 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE_

32: 08048490 5 FUNC GLOBAL DEFAULT 13 __libc_csu_fini

33: 08048495 0 FUNC GLOBAL HIDDEN 13 __i686.get_pc_thunk.bx

34: 0804a00c 0 NOTYPE WEAK DEFAULT 24 data_start

35: 00000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.0

36: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS _edata

37: 080484cc 0 FUNC GLOBAL DEFAULT 14 _fini

38: 08049f18 0 OBJECT GLOBAL HIDDEN 19 __DTOR_END__

39: 0804a00c 0 NOTYPE GLOBAL DEFAULT 24 __data_start

40: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__

41: 0804a010 0 OBJECT GLOBAL HIDDEN 24 __dso_handle

42: 080484ec 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used

43: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_

44: 08048430 90 FUNC GLOBAL DEFAULT 13 __libc_csu_init

45: 0804a01c 0 NOTYPE GLOBAL DEFAULT ABS _end

46: 08048330 0 FUNC GLOBAL DEFAULT 13 _start

47: 080484e8 4 OBJECT GLOBAL DEFAULT 15 _fp_hw

48: 0804a014 0 NOTYPE GLOBAL DEFAULT ABS __bss_start

49: 08048404 29 FUNC GLOBAL DEFAULT 13 main

50: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses

51: 080482d8 0 FUNC GLOBAL DEFAULT 11 _init

Histogram for bucket list length (total of 3 buckets):

Length Number % of total Coverage

0 0 ( 0.0%)

1 2 ( 66.7%) 50.0%

2 1 ( 33.3%) 100.0%

Histogram for `.gnu.hash' bucket list length (total of 2 buckets):

Length Number % of total Coverage

0 1 ( 50.0%)

1 1 ( 50.0%) 100.0%

Version symbols section '.gnu.version' contains 5 entries:

Addr: 000000000804828c Offset: 0x00028c Link: 5 (.dynsym)

000: 0 (*local*) 2 (GLIBC_2.0) 0 (*local*) 2 (GLIBC_2.0)

004: 1 (*global*)

Version needs section '.gnu.version_r' contains 1 entries:

Addr: 0x0000000008048298 Offset: 0x000298 Link: 6 (.dynstr)

000000: Version: 1 File: libc.so.6 Cnt: 1

0x0010: Name: GLIBC_2.0 Flags: none Version: 2

Notes at offset 0x00000188 with length 0x00000020:

Owner Data size Description

GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)

OS: Linux, ABI: 2.6.9

[/plain]

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

Conclusion

We've now seen how a simple program written in C is converted into the assembly code, the object file and finally the executable file. While in the C code, the program didn't have any sections, it had two sections in assembly dialect: the .rodata and .text. When we compiled it into an object file and finally into the executable, the file had more and more sections that are needed for the program to be executed successfully.

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.