x86 Assembly Language Applicable To Reverse Engineering: The Basics – Part 1
Overview
The x86 Assembly language or ASM is the lowest-level programming language understood by human kind and one of the most primitive ones; it can be described as machine language. If we can understand and handle assembly, then we can understand exactly how a computer works, which gives us the logic and especially the ability to code using any other programming language.
Intro to x86 Disassembly
Intro to x86 Disassembly
Programs coded in assembly are generally small, and can communicate much faster with the machine. Assembly language is called machine language because each Central Processing Unit (CPU) has its set of instructions (they set the architecture) which is the only thing that it understands, and is exactly the same for all 32-bit processors (which is due to the requirement of compatibility with all various devices present in the market).
That said, each assembly instruction is associated with a code which is always the same, so it uses a mnemonic device to serve each low level machine opcode (operation code). This article is not designed to teach you how to code using assembly language, the aim is introducing you the most common instructions you will meet when practicing reverse code engineering and handling dissemblers / debuggers, and providing you only a very basic introduction.
Here we go:
Registers
So that it can store information (under different values and different sizes), each processor is composed of different parts, kind of "boxes", called registers. They constitute one of the most important parts of the CPU, and according to the characteristics of the information to store (value, size, etc.) , using registers instead of memory makes the processor faster. We can consider three kinds of registers:
- General Registers: Used to manipulate data, to pass parameters when calling a DOS function, and to store intermediate results
- Status Registers.
- Segment Register: Used to store the starting address of a segment. It may be the address of the beginning of a program's instructions, the beginning of data, or the beginning of the stack.
Almost all registers can be divided into 16 and 8 bits. General registers begin with the letters A, B, C and D, and are the most used registers.
AX, BX, CX and DX are 16-bit-registers. Each of them can be broken down into two little 8-bit registers L and H (Low / High), for example AX(AL, AH). To get 32-bit registers we can add an "E" to the 16-bit registers which would give: EAX, EBX, ECX and EDX. (Please note that we cannot have EAH or EAL, since the low and the high parts of 32 bit-registers are not directly accessible).
Logically these registers can contain only values equals to their capacities. Actually the amount of bits (8, 16 and 32) corresponds to these capacities, that is to say: 8 bits = 255d, 16 bits = 65535d, 32 bits = 294 967 295d ("d" to say decimal, and these are the maximum values a register can contain).
Regarding Status Registers, they do not have 8-bit parts, so they contain neither H nor L. These registers are:
All of these are 16-bit registers, and can be extended to 32-bit by adding an "E" as well (EDI, ESI, EBP, EIP, and ESP). Segment Registers are in turn used to store and / or retrieve memory data.
To be more efficient and precise, the CPU needs an address; this address is divided into two 32- or 16-bit parts. The first is called "segment" the second is called "offset", which lets us say that 32-bit addresses are stored in segment:offset.
Segment Registers are read and written only in 16 bits and can contain addresses of a 64 KB segment. x86 assembly uses 32 bits offset. Various Segment Registers are:
The Stack
The stack is a memory area that can hold temporary data (functions parameters, variables, etc.) and is designed to behave in a "Last In, First Out" context, which means the first value stored in the stack (or pile) will be the last entry out. The sample always given when it comes to explaining how the stack works is "plates stacked up to be washed"; the last to be stacked will be the first to be washed.
Figure: Simple representation of a stack (wikipedia)
To be able to "push" data onto the stack and "pop" data from it, x86 assembly uses the instructions PUSH and POP.
Push Instruction
Push is used to decrement the Stack Pointer (SP: ESP), and using PUSH we can put a value on the top of the stack.
- PUSH AX
- PUSH BX
- PUSH 1986
First push AX onto the stack, then BX then the value 1986; but it's 1986 that will be "popped" first.
Pop Instruction
Pop increments the Stack Pointer by loading values or data stored in the location pointed to by SP.
- POP AX
- POP BX
- PUSH CX
Assuming AX =1 and BX = 2, and following the example of Push, the top most element, which is the value of BX (2), is stored in AX. Then BX contains 1, the value of AX. Now the stack is empty.
-
Flags, Conditional jumps, and Comparisons
- Flags
Flags are kind of indicator alterable by many instructions; they describe the result of logical instruction, arithmetic and mathematical instruction, instruction of comparison…
Flags are regrouped into the Flags Register and its 16-bit register.
- Bit 1: CF
- Bit 2: 1 < Reserved
- Bit 3: PF
- Bit 4: 0 < Reserved
- Bit 5: AF
- Bit 6: 0 < Reserved
- Bit 7: ZF
- Bit 8: SF
- Bit 9: TF
- Bit 10: IF
- Bit 11: DF
- Bit 12: OF
- Bit 13: IOPL
- Bit 14: NT
- Bit 15 : 0 < Reserved
- Bit 16 : RF
- Bit 17 : VM
Marked bits represent wildly used flags, and are used according to this:
Flags are directly related to conditional statements, which leads us to introduce conditional jumps before talking about comparisons.
Conditional jumps
We are about to discuss an interesting part insofar as it helps to understand the reaction of the program following the result of mostoperations (1 or 0).
Value
Jump
Signification
1
JB
Jump If Below
JBE
Jump If Below or Equal
JC
Jump if Carry
JNAE
Jump if Not Above or Equal
0
JA
Jump if Above
JAE
Jump if Above or Equal
JNB
Jump if Not Below
JNC
Jump if Not Carry
1
JE
Jump if Equal
JNA
Jump if Not Above
JZ
Jump if Zero
0
JNBE
Jump if Not Below or Equal
JNE
Jump in Not Equal
JNZ
Jump if Not Zero
1
JP
Jump if Parity
JPE
Jump if Parity Even
0
JNP
Jump if Not Parity
JPO
Jump if Parity Odd
1
JO
Jump if Overflow
0
JNO
Jump if Not Overflow
1
JS
Jump if Signed
0
JNS
Jump if Not Signed
And it's not without interest to add:
ZF = 1
SF = OFJG
Jump if Greater
JNLE
Jump if Not Less or Equal
OF
JGE
Jump if Greater or Equal
JNL
Jump if Not Less
OF
JL
Jump if Less
JNGE
Jump if Not Greater or Equal
ZF = 1
Signed SF = OF
JLE
Jump if Less or Equal
JNGE
Jump if Not Greater or Equal
To let a jump "decide" if it is taken or not, it needs to make some tests or comparisons using instructions like:
CMP instruction
CMP compares two operands but does not store a result. Using this statement, the program does a test between two values by subtracting them (it subtracts the second operand from the first), and following the result (0 or 1), it changes a given flag (Flags affected are OF, SF, ZF, AF, PF, and CF). For instance, if the two given values are equal, Zero Flag holds the value 1, otherwise it holds 0. CMP can be compared to SUB, another mathematical instruction.
- CMP AX, BX
Here CPM does AX-BX. If the result of this subtraction is equal to zero, the AX is equal to BX and this will affect ZF by changing its value to 1.
To make it easier, jumps are TAKEN when:
- Result is bigger than (unsigned numbers) - > JA
- Result is lower than (unsigned numbers) -> JB
- Result is bigger than (signed numbers) - > JG
- Result is lower than (signed numbers) -> JL
- Equality (signed and unsigned numbers) -> JE or JZ
Just add "N" after "J" to get the negative / opposite instruction (JA / JNA, JB / JNB…) so jumps ARE NOT taken if Result is NOT bigger then (unsigned numbers) - > JNB …
We are touching the end of this first part, we talked very basically about registers, the stack, flags, conditional jumps and the instruction of comparison CMP. In the next part we will talk essentially about mathematical and logical instructions of memory.
Intro to x86 Disassembly
Intro to x86 Disassembly
References