Important Code Constructs in Assembly Language: Advanced
We have covered some basic operations and conditional statements in Part 1. In this article, we will cover some more complex data structures to recognize in Assembly like arrays, linked lists, structs. So without further ado, let's start.
Arrays
Arrays are used to store similar data items. Arrays can also be globally or locally defined in which the concept of global and local prevails as is covered in Part 1. Consider below code part
Int i;
Int x[5];
for (i=0;i<10;i++)
{
x[i]=i;
}
The code is self-explanatory; it is just assigning the current value of i to corresponding value in the array x at index I while traversing through the loop ten times ( i<10). The main thing is to recognize how the assembly code is traversing through the array. We do not know the size of each element yet and also there has to be some base address by which all the memory locations of the array has been accessed and assigned some value. Let's look at the assembly code.
Mov [ebp+var_4], 0
Jmp short loc_123456:
Loc 987654:
Mov eax, [ebp+var_4]
Add eax, 1
Mov [ebp+var_4], eax
Loc 123456:
Cmp [ebp+var_4], 10
Jge 234567:
Mov ecx, [ebp+var_4]
Mov[ebp+ecx*4+var_8],ecx
Jmp short loc_987654
As we can see here firs the normal 'for' loop initialization and comparison happen. Then the current value is transferred to ecx. Why? It will be used as an index to array x. In statement [ebp+ecx*4+var_8], var_8 is used as the base address of the array x. Try and put different values of ecx from 0,1,2,3… You will see how the each memory location of the array has been assigned the value. So when recognizing arrays look out for these constructs which include base address and indexing variable.
Structs
If you see arrays not capable with heterogeneous data, there is another data structure called struct there to help. Structs can hold items of different data types. If you remember in my PE series, there ae lot of structs in data directory. Consider following struct code where struct variable declaration, memory allocation, definition are listed:
struct Test_struct *Test; // variable declaration
Void main()
{
Test = (struct Test_struct *) malloc(sizeof(Test_struct)); // memory allocation for struct
testing(Test);
}
Test_struct // struct definition
{
Int x[3];
Char b;
}
Void testing(struct Test_struct *a)
{
Int i,
a->b='l';
for (i=0;i<3;i++)
{
a->x[i]=i;
}
}
Note here that I have declared a pointer structure variable and thus using -> to access the structure member. In case of simple struct variable, dot variable(.) is used to access struct members. Let's look at the assemble code. Since the code has various constructs we will look their assembly code in pieces as well.
Main:
Push ebp
Mov ebp,esp
Push 16h
Call malloc
Add esp, 4
mov dword_126785,eax
Mov eax, dword_126785
Push eax
Call loc 234567
Add esp,4
………
Loc 234567:
Push ebp
Mov ebp, esp
Push ecx,
Mov eax,[ebp+var_4]
Mov byte [eax+13h],6ch
Mov [ebp+var_8], 0
Jmp short loc_123456
Loc 987654:
Mov eax,[ebp+var_8]
Add eax,1
Mov [ebp+var_8],eax
Loc 123456:
Cmp [ebp+var_8], 3
Jge 12345:
Mov eax,[ebp+var_8]
Mov edx,[ebp+var_8]
Mov ecx,[ebp+var_4]
Mov[ecx+eax*4],edx
Jmp short loc_987654
Ok so this looks a bit complex at first, but if we look at instruction by instruction, it will be simple. First sizeof function passes 16h to the stack to allocate space for the struct. Then malloc is called and then the variable is cleared. After that dword_126785 holds the base address of the structure and it is passed onto the testing function via push eax. Then we move onto the testing function where the base address is accessed and put the value into eax. Variable at 13h inside eax address is the character 'l' with value 6ch. Then at var_8 is the base address of the integer array and integer array is referenced from ecx+eax*4 where ecx holds the base location of the struct and eax will traverse through the integer array an index. Main analysis of this structure is that different variables are inside the sam,e data structure i.e. char and integer array within the same data structure. So this structure cannot not be an array.
Linked List
Data items in Linked List are linked with each other i.e. every record points to the next data item. But it is not guaranteed that these will be stored in the same way in memory as well. To identify this data structure, one needs to find an object that contains the pointer to another object. Keep in mind in this article we are only talking about singly linked list. Consider the following linked list implementation with Node structure.
Struct node
{
Int a;
Struct node *next;
}
Void main()
{
Node *start, *temp;
Int i;
start=NULL;
for(i=0;i<5;i++)
{
temp=( Node *)malloc(sizeof(Node)); // pointer to base address of structure
temp ->a=i; // storing i to a
temp ->next=start; // point to next node
start=temp; // saving the value of temp in start
}
}
Push ebp
Mov ebp,esp
Mov[ebp+var_4],0
Mov[ebp+var_8],1
Loc:987654
Cmp [ebp+var_8],5h
Jge 12345:
Push 8h
Call malloc
Mov[ebp+var_c],eax
Mov edx, [ebp+var_c]
Mov eax,[ebp+var_8]
Mov [edx],eax
Mov eax,[ebp+var_4]
Mov edx,[ebp+var_c]
Mov[edx+4],eax
Mov eax,[ebp+var_c]
Mov [ebp+var_4],eax
Jmp short loc _123456
Loc:123456
Mov eax,[ebp+var_8]
Add eax,1
Mov[ebp+var_8],eax
Jmp 986754
So in this assembly what is happening is that initially the offset of the first node is calculated and stored. Then the value of i is stored at the offset of the first node, and NULL is stored in the *next at [edx+4].In the end, the base offset value is stored in the *start. In the second traversal, second node size if calculated, offset is stored and again var i value is stored there. At edx+4 is stored the value of the first offset and at the end of traversal offset of the second node is stored in start again so that 3rd node points to 2nd node thus forming a list. There can be multiple variations to the assembly code. The code to recognize is that next variable pointing to another struct again and again thus forming a linked list.
So this is all about structures that we will discuss. As the goal of this and PE article series is to develop malware analysis skills, I think now we can start looking at malware analysis which I will start from next article.