Understanding variables in C
Variables in C are data storage units that reserve space in the memory. There are different types of variables. Each type requires different amounts of memory, but the memory requirements are predetermined. Variables are further governed by sets of operations applied to them.
Below, we’ll discuss how to identify variables when analyzing executables. C code snippets are shown, as well as their assembly equivalents and how the stack is used when subroutines are called in a program.
Variables
Depending on where they are declared, variables are of two types — global variables and local variables. This is how they can be identified with a debugger.
Global variables
The C code snippet shows the use of global variables.
int a = 10;
void main()
{
printf("The value of a is %dn", a);
}When compiled, global variables are referenced by memory location as shown in the excerpt when opened in OllyDbg.
MOV EBP,ESP
AND ESP,FFFFFFF0
SUB ESP,10
CALL global_v.004015D0
MOV EAX,DWORD PTR DS:[403004] ; |
MOV DWORD PTR SS:[ESP+4],EAX ; |
MOV DWORD PTR SS:[ESP],global_v.00404000 ; |ASCII "The value of a is %d"
CALL <JMP.&msvcrt.printf> ; printf
NOP
LEAVE
RETNThe checked value is referenced at 00403004, as shown below.
0000000A is the hex equivalent of 10, which is what is stored in variable “a” in the C program. This is verified using a debugger.
The preceding program needs to be executed to get a Windows 32-bit binary. A cross-compiler known as MinGW is used to produce a Windows 32-bit executable on a Kali Linux machine. The command below can be used to do the same.
Note: The cross-compiler to produce Windows executables is not preinstalled in Kali Linux.
The executable file is opened in OllyDbg and the program is run in the debugger. Below is the result.
A breakpoint is set at the address 0040151E and the program is run again. When the breakpoint is hit, the value of the EAX register is checked and shows as 00000001, as shown below.
The instructions are single-stepped through once so the instruction below will be executed.
The value of EAX is checked again. It should hold hex value 0000000A, as shown in below.
The figure above shows that global variables are referenced by memory addresses.
Local variables
The C code snippet shows the use of local variables. It initializes a variable inside the main method. The variable “a” is local to the main method.
void main()
{
int a = 10;
printf("The value of a is %dn", a);
}When compiled, local variables are referenced by stack addresses. This looks as shown below when opened in OllyDbg.
MOV EBP,ESP
AND ESP,FFFFFFF0
SUB ESP,20
CALL local_va.004015D0
MOV DWORD PTR SS:[ESP+1C],0A ; |
MOV EAX,DWORD PTR SS:[ESP+1C] ; |
MOV DWORD PTR SS:[ESP+4],EAX ; |
MOV DWORD PTR SS:[ESP],local_va.00404000 ; |ASCII "The value of a is %d"
CALL <JMP.&msvcrt.printf> ; printf
NOP
LEAVE
RETNAs shown above, the value stored at [ESP+1C] is being moved to EAX. The value on the stack that is referenced by [ESP+1C] is verified. 1C in hex translates to 28 in decimal. The value at 28 bytes from ESP is checked.
The value is 0000000A, which was translated to decimal value 10. This is being moved to the EAX register. Once this instruction is executed, EAX should have the value referenced by the stack address shown below.
Calling conventions in C
The snippet below shows how functions are used in C.
void test_function(int arg1, int agr2);
void main()
{
test_function(10,20);
}
void test_function(int arg1, int arg2){
int x = 50;
int y = 40;
}A function named test_function is called, and it has two arguments with the values 10 and 20. There are two local variables initialized in the called function test_function. The code below is taken from the disassembly of the binary obtained using the above code.
MOV EBP,ESP
AND ESP,FFFFFFF0
SUB ESP,10
CALL function.004015E0
MOV DWORD PTR SS:[ESP+4],14
MOV DWORD PTR SS:[ESP],0A
CALL function.00401535
NOP
LEAVE
RETNThe highlighted lines above demonstrate what happens before a subroutine is called.
The two arguments represented in hex are pushed onto the stack. Hex 14 and 0A translate to decimal values 20 and 10, respectively. These are the arguments passed to the function named test_function.
The arguments are pushed onto the stack from an inverted order. The second argument is pushed onto the stack, followed by the first argument.
MOV EBP,ESP
SUB ESP,10
MOV DWORD PTR SS:[EBP-4],32
MOV DWORD PTR SS:[EBP-8],28
NOP
LEAVE
RETNOnce the function is called, the two local variables represented in hex are moved onto the stack. 32 in hex translates to 50, which is variable “x.” Next, hex value 28 is pushed onto the stack. 28 in hex translates to decimal value 40, which is used to initialize variable “y.”
Become a certified reverse engineer!
Conclusion
Variables in C are placeholders and workspaces within the memory. They have preset memory requirements and are controlled by sets of operations that are applied to them. Local variables and global variables written in C programming are referenced in assembly language. Global variables are referenced using memory addresses, whereas local variables are referenced using stack and arguments. Local variables are also pushed onto the stack when a function is called.
Sources
- Michael Sikorski and Andrew Honig, "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software," No Starch Press, February 2012
- Reverse Engineering for Beginners, Dennis Yurichev
- x86 Assembly Guide, University of Virginia Computer Science