Arrays, Structs and Linked Lists
Reverse engineering is the practice of analyzing a software system, either in whole or in part, to extract design and implementation information 1. This article will introduce the functions of arrays, structs and linked lists in C code. It will go further to discuss some scenarios where arrays, structs and linked lists can be applied. The article will finally give a C code in assembly for structs, arrays and linked lists and then explain how they can be used in reverse engineering malwares.
Arrays
An array is a collection of data items, all of the same type, which can be accessed using a common name 2. It is a variable used to store a collection of data of the same type. Let’s assume there is a need to write a program that accepts as input the ages of 100 students in a school and then computes the average of these 100 students. Even though this can be achieved by declaring 100 integer variables, it is convenient to code this using arrays.
Become a certified reverse engineer!
How to Declare an Array
To declare an array in C code, we need to specify the data type which would be stored in the array, the variable name and the size of the array. An integer array with 100 elements is declared as follows.
The statement allocates a continuous memory location for 100 integers and initializes all to 0 value. The memory layout for this array declaration is as shown in Figure 2. The array indexes start from the first element a[0] to the last element a[99].
How to Initialize an Array
The statement in Figure 3 initializes a 7-element array called height of type double. This automatically creates an array of size 7 and initializes height [0] to 17.6, height [1] to 19.3, height [2] to 20.1, height [3] to 11.4, height [4] to 8.7, height [5] to 6.3 and height [6] to 8.0.
How to Access an Array
Arrays are accessed by their indexes. As shown in Figure 4, this is done by stating the variable type, variable name and placing the index of the element inside square brackets. Here, the fourth element of the array is taken and the value assigned to the newStudent variable. Let’s now consider how to use the three array concepts; declaration, initialization and accessing arrays.
How Assembly Code works
In this article, we shall convert all C codes to Assembly using Compiler Explorer.
Machine language which consists of only binary is a computer’s native language. Computers only execute instructions given in machine code or binary. Assembly language is one step up from machine language. It uses short descriptive words called mnemonic to represent machine code instructions. The syntax for assembly language takes the format: mnemonic destination,source. The processor used to run all codes in this article is an AMD64 chipset.
A struct is a user-defined collection of variables declared under a single name. These variables can be of different types. C structs are different from arrays because arrays hold data of the same type. C struct, on the other hand, can store data of different data-types. Figures show how to declare, initialize and access struct variables in C. Figures 11a and 11b are the Assembly equivalent for the C code. The output for the C code is shown.
Linked Lists
A linked list is a sequence of items arranged one after another 3. Each item in the list is connected to the next item via a link.
Each element, known as a node, of a list is made up of two items - the data and a reference to the next element. The last node makes reference to a null element. The entry point into a linked list is called the head of the list 4. Each element can store a data point which may be a number, a string or any other type of data.
One disadvantage of a linked list versus an array is that it does not allow direct access to the individual nodes. If you want to access a particular item, you have to start at the head of the node and follow through the references until you get to that node within the list. The codes create a node within a struct, initialize the nodes, connect the nodes and print out their values.
Become a certified reverse engineer!
Conclusion
Malware is mostly packaged and deployed in a form which analysts cannot read. The reverse engineering process gives analysts with an understanding of C code in assembly the chance of figuring out what the malware is doing.
Malware designers may use arrays to store data of the same type, structs to store data of different types and linked lists to shrink or grow data. Having a good understanding of these codes enables analysts to for instance pinpoint locations that hold variables during reverse engineering.
Sources
- https://link.springer.com/chapter/10.1007/978-3-642-04117-4_31#:~:text=Software%20reverse%20engineering%20(SRE)%20is,extract%20design%20and%20implementation%20information.&text=Reverse%20engineering%20skills%20are%20also,and%20to%20protect%20intellectual%20property
- https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/Arrays.html
- https://www.cs.bu.edu/fac/gkollios/cs113/Slides/linkedlist.pdf
- https://www.cs.cmu.edu/~adamchik/15-121/lectures/Linked%20Lists/linked%20lists.html