Reverse engineering

x86 Assembly Language Applicable To Reverse Engineering: The Basics – Part 1

Soufiane Tahiri
October 15, 2012 by
Soufiane Tahiri

Overview

The x86 Assembly language or ASM is the lowest-level programming language understood by human kind and one of the most primitive ones; it can be described as machine language. If we can understand and handle assembly, then we can understand exactly how a computer works, which gives us the logic and especially the ability to code using any other programming language.

Intro to x86 Disassembly

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

Intro to x86 Disassembly

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

Programs coded in assembly are generally small, and can communicate much faster with the machine. Assembly language is called machine language because each Central Processing Unit (CPU) has its set of instructions (they set the architecture) which is the only thing that it understands, and is exactly the same for all 32-bit processors (which is due to the requirement of compatibility with all various devices present in the market).

That said, each assembly instruction is associated with a code which is always the same, so it uses a mnemonic device to serve each low level machine opcode (operation code). This article is not designed to teach you how to code using assembly language, the aim is introducing you the most common instructions you will meet when practicing reverse code engineering and handling dissemblers / debuggers, and providing you only a very basic introduction.

Here we go:

Registers

So that it can store information (under different values and different sizes), each processor is composed of different parts, kind of "boxes", called registers. They constitute one of the most important parts of the CPU, and according to the characteristics of the information to store (value, size, etc.) , using registers instead of memory makes the processor faster. We can consider three kinds of registers:

  1. General Registers: Used to manipulate data, to pass parameters when calling a DOS function, and to store intermediate results
  2. Status Registers.
  3. Segment Register: Used to store the starting address of a segment. It may be the address of the beginning of a program's instructions, the beginning of data, or the beginning of the stack.

Almost all registers can be divided into 16 and 8 bits. General registers begin with the letters A, B, C and D, and are the most used registers.

  • AX – Accumulator Register: used to perform arithmetic operations or send a parameter to an interruption.
  • BX – Base Register: used to perform arithmetic operations or as the base address of an array.
  • CX – Counter Register: used generally as a counter on loops.
  • DX – Data Register: used to store data for functions, and as a port number in input / output operations.
  • AX, BX, CX and DX are 16-bit-registers. Each of them can be broken down into two little 8-bit registers L and H (Low / High), for example AX(AL, AH). To get 32-bit registers we can add an "E" to the 16-bit registers which would give: EAX, EBX, ECX and EDX. (Please note that we cannot have EAH or EAL, since the low and the high parts of 32 bit-registers are not directly accessible).

    Logically these registers can contain only values equals to their capacities. Actually the amount of bits (8, 16 and 32) corresponds to these capacities, that is to say: 8 bits = 255d, 16 bits = 65535d, 32 bits = 294 967 295d ("d" to say decimal, and these are the maximum values a register can contain).

    Regarding Status Registers, they do not have 8-bit parts, so they contain neither H nor L. These registers are:

    • DI – Destination Index: mainly used when handling string instructions, and is generally associated with Segment Registers DS or ES.
    • SI – Source Index: used as source data address when it comes to manipulating strings, and is generally associated with Segment Register DS.
    • BP – Base Pointer: when a subroutine is called by a "CALL", this register is partnering with the SS Segment Register to access data from the stack and is generally used for registering indirect addresses.
    • IP – Instruction Pointer: associated with the Segment Register CS to indicate the next instruction to execute, and indirectly modified by jumps instructions, subroutines and interrupts.
    • SP – Stack Pointer: used with Segment Register SS (SS: SP) to indicate the last element of the stack.
    • All of these are 16-bit registers, and can be extended to 32-bit by adding an "E" as well (EDI, ESI, EBP, EIP, and ESP). Segment Registers are in turn used to store and / or retrieve memory data.

      To be more efficient and precise, the CPU needs an address; this address is divided into two 32- or 16-bit parts. The first is called "segment" the second is called "offset", which lets us say that 32-bit addresses are stored in segment:offset.

      Segment Registers are read and written only in 16 bits and can contain addresses of a 64 KB segment. x86 assembly uses 32 bits offset. Various Segment Registers are:

      • CS –Code Segment: contains address of segment with CPU instructions referenced by Instruction Pointer register (IP) and is updated with far jump, far call, and return instructions.
      • SS – Stack Segment: contains all data referenced by Stack Pointer and Base Pointer.
      • ES – Extra Segment: referenced by Destination Index (DI) in string manipulation.
      • DS – Data Segment: contains all data referenced by Accumulator Register, Base Register, Counter Register, Data Register, Source Index, and Destination Index.
      • The Stack

        The stack is a memory area that can hold temporary data (functions parameters, variables, etc.) and is designed to behave in a "Last In, First Out" context, which means the first value stored in the stack (or pile) will be the last entry out. The sample always given when it comes to explaining how the stack works is "plates stacked up to be washed"; the last to be stacked will be the first to be washed.

        Figure: Simple representation of a stack (wikipedia)

        To be able to "push" data onto the stack and "pop" data from it, x86 assembly uses the instructions PUSH and POP.

        Push Instruction

        Push is used to decrement the Stack Pointer (SP: ESP), and using PUSH we can put a value on the top of the stack.

        • PUSH AX
        • PUSH BX
        • PUSH 1986

        First push AX onto the stack, then BX then the value 1986; but it's 1986 that will be "popped" first.

        Pop Instruction

        Pop increments the Stack Pointer by loading values or data stored in the location pointed to by SP.

        • POP AX
        • POP BX
        • PUSH CX

        Assuming AX =1 and BX = 2, and following the example of Push, the top most element, which is the value of BX (2), is stored in AX. Then BX contains 1, the value of AX. Now the stack is empty.

        • Flags, Conditional jumps, and Comparisons
          • Flags

          Flags are kind of indicator alterable by many instructions; they describe the result of logical instruction, arithmetic and mathematical instruction, instruction of comparison…

          Flags are regrouped into the Flags Register and its 16-bit register.
        1. Bit 1: CF
        2. Bit 2: 1 < Reserved
        3. Bit 3: PF
        4. Bit 4: 0 < Reserved
        5. Bit 5: AF
        6. Bit 6: 0 < Reserved
        7. Bit 7: ZF
        8. Bit 8: SF
        9. Bit 9: TF
        10. Bit 10: IF
        11. Bit 11: DF
        12. Bit 12: OF
        13. Bit 13: IOPL
        14. Bit 14: NT
        15. Bit 15 : 0 < Reserved
        16. Bit 16 : RF
        17. Bit 17 : VM

        Marked bits represent wildly used flags, and are used according to this:

        • CF – Carry Flag: affected by the result of arithmetic instructions, "used to indicate when an arithmetic carry or borrow has been generated out of the most significant ALU bit position." (Wikipedia)
        • PF – Parity Flag: takes value 1 if an operand's number of bits is even.
        • AF – Auxiliary Flag (or Adjust Flag): "indicates when an arithmetic carry or borrow has been generated out of the 4 least significant bits." (Wikipedia)
        • ZF – Zero Flag: used to check the result of arithmetic operations. If an operand result is equal to 0, ZF takes the value 1, used frequently to compare the result of a subtraction.
        • SF – Sign Flag: takes the value 1 if the result of the last mathematical operation is "signed" (+ / -)
        • IF – Interrupt Flag: by taking the value 1, IF lets the CPU handle hardware interrupts, if set to 0, the CPU will ignore such interrupts.
        • DF – Direction Flag: controls the direction of pointers movement (on strings processing for example, left to right / right to left.)
        • OF – Overflow Flag: indicates if an overflow occurred during an operation and may also be used to correct some mathematical operation errors in case of overflows (if overflow, OF takes the value 1).
        • Flags are directly related to conditional statements, which leads us to introduce conditional jumps before talking about comparisons.

          Conditional jumps

          We are about to discuss an interesting part insofar as it helps to understand the reaction of the program following the result of mostoperations (1 or 0).

          Flags

          Value

          Jump

          Signification

          CF

          1

          JB

          Jump If Below

          JBE

          Jump If Below or Equal

          JC

          Jump if Carry

          JNAE

          Jump if Not Above or Equal

          0

          JA

          Jump if Above

          JAE

          Jump if Above or Equal

          JNB

          Jump if Not Below

          JNC

          Jump if Not Carry

          ZF

          1

          JE

          Jump if Equal

          JNA

          Jump if Not Above

          JZ

          Jump if Zero

          0

          JNBE

          Jump if Not Below or Equal

          JNE

          Jump in Not Equal

          JNZ

          Jump if Not Zero

          PF

          1

          JP

          Jump if Parity

          JPE

          Jump if Parity Even

          0

          JNP

          Jump if Not Parity

          JPO

          Jump if Parity Odd

          OF

          1

          JO

          Jump if Overflow

          0

          JNO

          Jump if Not Overflow

          SF

          1

          JS

          Jump if Signed

          0

          JNS

          Jump if Not Signed

          And it's not without interest to add:

          ZF and SF

          ZF = 1

          SF = OF

          JG

          Jump if Greater

          JNLE

          Jump if Not Less or Equal

          SF

          OF

          JGE

          Jump if Greater or Equal

          JNL

          Jump if Not Less

          Signed SF

          OF

          JL

          Jump if Less

          JNGE

          Jump if Not Greater or Equal

          ZF and signed SF

          ZF = 1

          Signed SF = OF

          JLE

          Jump if Less or Equal

          JNGE

          Jump if Not Greater or Equal

          To let a jump "decide" if it is taken or not, it needs to make some tests or comparisons using instructions like:

          CMP instruction

          CMP compares two operands but does not store a result. Using this statement, the program does a test between two values by subtracting them (it subtracts the second operand from the first), and following the result (0 or 1), it changes a given flag (Flags affected are OF, SF, ZF, AF, PF, and CF). For instance, if the two given values are equal, Zero Flag holds the value 1, otherwise it holds 0. CMP can be compared to SUB, another mathematical instruction.

          • CMP AX, BX

          Here CPM does AX-BX. If the result of this subtraction is equal to zero, the AX is equal to BX and this will affect ZF by changing its value to 1.

          To make it easier, jumps are TAKEN when:

          • Result is bigger than (unsigned numbers) - > JA
          • Result is lower than (unsigned numbers) -> JB
          • Result is bigger than (signed numbers) - > JG
          • Result is lower than (signed numbers) -> JL
          • Equality (signed and unsigned numbers) -> JE or JZ

          Just add "N" after "J" to get the negative / opposite instruction (JA / JNA, JB / JNB…) so jumps ARE NOT taken if Result is NOT bigger then (unsigned numbers) - > JNB …

          We are touching the end of this first part, we talked very basically about registers, the stack, flags, conditional jumps and the instruction of comparison CMP. In the next part we will talk essentially about mathematical and logical instructions of memory.

          Intro to x86 Disassembly

          Intro to x86 Disassembly

          Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

          Intro to x86 Disassembly

          Intro to x86 Disassembly

          Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

          References

          Soufiane Tahiri
          Soufiane Tahiri

          Soufiane Tahiri is is an InfoSec Institute contributor and computer security researcher, specializing in reverse code engineering and software security. He is also founder of www.itsecurity.ma and practiced reversing for more then 8 years. Dynamic and very involved, Soufiane is ready to catch any serious opportunity to be part of a workgroup.

          Contact Soufiane in whatever way works for you:

          Email: soufianetahiri@gmail.com

          Twitter: https://twitter.com/i7s3curi7y

          LinkedIn: http://ma.linkedin.com/in/soufianetahiri

          Website: http://www.itsecurity.ma