Important Code Constructs in Assembly Language: Basics
This article represents a continuing of the malware analysis course we began with the complete structure of PE and ELF earlier. Another important concept which often comes to rescue as to judge what the malicious program is doing is through the understanding of the assembly code and how various Code constructs looks in binary. In this article, we will learn about recognizing code constructs in Assembly.
Note: This article assumes that the audience will have a fair understanding of the Assembly language. If not do not worry I will try to explain the things in the simplest form
Scope of Variables
Variables are declared in code to hold values of constants, strings, etc. The scope of these variables can be either local or global. Local variables hold meaning within a function only whereas global variables can be used anywhere in the code logic. When we look into a dissembled code, all the references made to the global variables are to some memory address whereas local variables are referenced with an offset to ebp register. For those not familiar with Assemble language prologs whenever a new function starts to assemble instructions like
ebp sets up a stack frame for new functions and second instruction will set ebp as a base. Since the stack grows towards lower addresses which means local variables can be accessed like mov eax, [ebp-4]; which will move whatever is at 4 bytes lower to ebp into eax. Now if we come to global variables since they are not local to any function which means that referencing them with ebp will not make sense, so they are referred to the memory address for example mov eax, dword_50CF60 where he global variable is at 0x50CF60.
Conditional Statement
If statement is most often used in code logic. A very basic usage of If statement is below:
And below is the Assembly Language for this
mov [ebp + var-4],1
mov [ebp+var_8],2
mov eax,[ebp+var_4]
cmp eax,[ebp+var_8]
jnz short loc_50101B
push offset aequalsb;"a & b are equaln"
……
Push offset anotequalb;"a & b are not equal"
……
Notice the combo use of cmp and jnz-compare' and 'jump if no zero' instruction which corresponds to if(a==b) in the code. Cmp instruction performs the operation of sub-subtract instruction internally. Thus, it means to compare both variables and if they are not equal jump to this location in memory and print string "a & b are not equal." Otherwise, if variables are equal then the jnz instruction will be skipped and "a & b are equal" string will be printed. In the case of nested if statements you will see multiple cmp,jnz/jz,some print strings or any other operation followed by or directly another cmp instruction.
Loops
Loop statements are used in code logic to iterate some operation and execute some instruction until some condition is satisfied. For loop and while loop are used more often in code logic.
For loop looks for 4 things; initialization, comparision, execution,increment/decrement.
for(int i=0;i<10;i++)
{
Printf("Current value of i is %dn",i);
}
So in assembly also, loop for these 4 components
mov [ebp+var_4],0
jmp short loc_102345
loc_987654
mov eax, [ebp+var_4]
add eax,1
mov [ebp+var_4],eax
loc_102345
cmp [ebp+var_4],Ah
jge short loc_23456
mov ecx,[ebp+var_4]
push ecx
push offset iValue;"Current Value of i is %dn"
call printf
add esp, 8
jmp loc_987654
As we can see in above assembly code first instruction is used for initialization, then a jump is done to compare the value with 10(Ah), if greater than or equal to 10 then it will jump to loc_ 23456 otherwise print the value of I and then jump to loc_987654 and increment the value of i . Then the value is compared again (is (i=1) >=( i=10)). If not then the value is printed and again incremented. This whole process continues.
While loop are commonly found in code as they are easier to track. For example
int i=0;
while(i<10)
{
printf("current value of I is %dn",i)
i++;
}
Assemble Language
Mov [ebp+var_4], 0
Jmp short loc_12345
Loc_123456:
mov eax, [ebp+var_4]
add eax,1
mov [ebp_var_4],eax
Loc_102345:
cmp [ebp+var_4],Ah
jge short loc_234567
mov ecx,[ebp+var_4]
push ecx
push offset iValue," current value of I is %dn"
add esp,8
jmp loc_1023456
As we can see in this assembly code for while loop, code is similar to for loop.
Switch Statements
Switch cases are often used by programmers and make decision based on byte value. For example:
Switch(i)
{
Case 1:
Printf("Current Value of I is %dn",i+1);
break;
Case 2:
Printf("Current Value of I is %dn",i+1);
break;
Case 3:
Printf("Current Value of I is %dn",i+1);
break;
Case 4:
Printf("Current Value of I is %dn",i+1);
break;
default:
break;
}
In assembly language this often looks like a series of If statements where a lot of cmp and jmp instruction duo will be used as stated earlier. This is the case when the cases are in not in order like case 1, case 12, case 17 etc. In this case there will be a lot of if –else statements structure in the assemble code. In case where variable follows continuous order like case 1, case 2, case 3 etc. compliers smartly optimizes the code like below:
Mov ecx,[ebp+var_4]
Sub ecx,1
cmp [ebp+var_8],3
ja loc_12345
mov edx,[ebp+var_8]
jmp short loc_987650[edx*4]
loc_234564:
…..
Jmp loc_12345:
loc_234565:
…..
Jmp loc_12345:
loc_234566:
…..
Jmp loc_12345:
loc_234567:
…..
Jmp loc_12345:
loc_12345:
// stack clearing code
loc_987650
offset loc_234564 //jump table
offset loc_234565
offset loc_234566
offset loc_234567
So what has happened here? First, I have made color coding for a similar memory address(Hope that helps !). Initially the case variable for example 'i', is stored in ecx then it is decremented by 1. Why ? Here comes the concept of jump tables i.e. for in-order case statements compiler optimizes the code by construction a jmp tables which is an array for memory locations to various switch statements. So ecx is decremented because it will be used as an offset pointer to jump table which starts at 0. The biggest case variable is compared first because in that case it will directly go to the default case if present or outside the struct. For other variables, they will be set as an offset to jump table.
So these are very basic structures that can be recognized in code. In the second part of this series I will discuss more complex structures like arrays, structures, and linked lists.