Reverse engineering

x86 Assembly Language Applicable To Reverse Engineering: The Basics – Part 2

Soufiane Tahiri
October 30, 2012 by
Soufiane Tahiri

For part 1 of this series, please click here.

Introduction

Intro to x86 Disassembly

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

We saw in the first article an introduction to the most common x86 assembly instructions seen when it comes to disassembling and analyzing programs. We talked about registers, the stack / pile, flags , conditional jumps and the instruction of comparison CMP. We will continue this part by introducing some mathematical and logical instructions, and some basic memory ones.

This is intended for beginners in reverse engineering who are still confused in front of the code shown by a disassembler.

  • Mathematical instructions
    • Multiplication : MUL / IMUL
      • MUL instruction

Very useful, the CPU uses either the instruction MUL (for unsigned multiplication) or IMUL (for signed multiplication). To do multiplication, it multiplies an operand (a register or a memory operand) by AL, AX, or EAX registers and stores the product on one or more registers (BX, CX).

  • With AX = 3 and BX = 5
  • MUL BX
  • The result will be AX = 3 x 5 = 15 and BX = 5

    • IMUL instruction
    • It behaves in the same way as MUL, except being used for signed operations, and preserves the sign of the product. Note that using the instruction CWD (convert word to double) is a must. Extending the sign of AX into DX is a must to avoid mistaken results.

      • With AL = 5 and BL = 12
      • IMUL BL
      • The result will be AL = 5 x 12 = 003Ch and OF = 1 since AH is not a sign extension of AL so the OF flag is altered and set to 1.
        • Division : DIV / IDIV
          • DIV instruction
      • Exactly the same as MUL and IMUL, DIV is used for unsigned divides and does division on unsigned integers.

        • With AX = 18 and BX = 5
        • DIV BX
        • The result will be Quotient AX = 3 and remainder DX = 3

          • IDIV instruction
          • Used for signed integer divides and using the same operands as DIV instruction, AL must be extended using the instruction CBW (convert byte to word) to the high order register which is AH before executing IDIV.

            • With AL = -48 and BL = 5
            • MOV AL, -48 (puts -48 – the dividend - into AL)
            • CBW (extends AL into AH)
            • MOV BL, 5 (puts 5 - the divisor - into BL)
            • IDIV BL
            • The result will be AL=-9 and AH = -3
            • Note : we will see instruction MOV later.
              • The opposite of a number : NEG
            • A simple instruction, it requires a destination to which it inverses the sign, "+" becomes "-"or "-" becomes "+"

              • With AX = 8
              • NEG AX
              • The result will be AX = -8

                • Floating point numbers
              • And this is a real problem! x86 assembly cannot deal directly with floating point numbers, and has no specific register for them. The trick is using large numbers that would be divided to return a result in a given interval. This is Chinese!

                To see how this actually works, let's suppose that we want to do 156 x 0.5, and admit that we want to put 0.5 into AX that does not accept floating point numbers. Well, let's multiply 0.5 by 256, which gives an integer: 128. Once we get our integer, we put it into AX, and now we can multiply 156 by 128, which leads to a result 256 time bigger then what we need, so we will divide the result by 256. This way we will get the result of 156 x 0.5 without using a single point.

                Technically this sample will look like:

                • MOV AX, 128
                • MOV BX, 156
                • MUL BX
                • SHR AX, 8 (will divide the result by 2^8 which is equal to 256)
                • The result will be

                  156 * 128 = 19968 divided by 256 =78 and this is equal to 156 * 0.5

                  • Negative numbers
                • At school when studying negative numbers things were really easy for us and mush easier for teachers , just add negative sign "-" and you got your negative number! Unfortunately things are a bit more complicated when it comes to x86 assembly code. In binary we cannot add "-"; there is only 0 and 1!

                  There is a method used that consists of:

                  1. Converting the concerned number to binary.
                  2. Reversing the binary bits (replace 0 by 1 and 1 by 0)
                  3. Adding 1 to the result

                  Let's take 5 for instance. Five in decimal is equivalent to 00000101(Tab 1) in binary (actually 101 is OK but we need to work in 8 bit). By reversing bits we get 11111010 and 11111010

                  + 1 gives 11111011. So -5 in binary is equal to 11111011.

                  Numbers

                  8 bits Binary representation

                  0

                  00000000

                  1

                  00000001

                  2

                  00000010

                  3

                  00000011

                  4

                  00000100

                  5

                  00000101

                  6

                  00000110

                  7

                  00000111

                  8

                  00001000

                  9

                  00001001

                  A

                  10

                  00001010

                  B

                  11

                  00001011

                  C

                  12

                  00001100

                  D

                  13

                  00001101

                  E

                  14

                  00001010

                  F

                  15

                  00001111

                  Table 1: Some Common Numeral Systems

                  • Decimal Base from 0 to 9 (said base-10)
                    • Octal Base from 0 to 7 (said base-8)
                    • Hexadecimal Base from 0 to F (said base-16)
                      • Logical AND
                    • This instruction AND (destination, source) does a logical operation between two values and the result Tue is set to the "destination" if and only if the

                      destination and source are true. This means it sets 1 to the destination if and only if both operands are true, or else it sets 0 to the destination.

                      • MOV AX, 54
                      • MOV BX, 43
                      • AND AX, BX will result on AX = 34
                      • Binary explication :
                      • 00110110 (54)
                      • 00101011 (43)
                      • AND 00110110, 00101011 gives 00100010 (AX = 34)
                        • Logical inclusive or : OR
                        • This does an inclusive "OR" between two operands, the result is set to the source. The result of "OR" is 0 if and only if both operands are equal to 0; otherwise the result is 1.

                          • MOV AX, 12
                          • MOV BX, 26
                          • AND AX, BX will result on AX = 36
                          • Binary explanation :
                          • 00001100 (12)
                          • 00011010 (26)
                          • AND 00001100, 00011010 gives 00011110 (AX = 30)

                            • Logical exclusive or : XOR
                          • Used in some cryptographic operations, it does an exclusive OR between destination and source. XOR is also considered as an addition with bites carry. The XOR is also used to reset the value of a register to zero; performing a XOR on a value against itself will always result in zero.

                            • Case 1
                            • MOV AX, 15
                            • MOV BX, 24
                            • XOR AX, BX will result on AX = 23
                            • Binary explanation :
                            • 00001111 (AX = 15)
                            • XOR 00011000 (BX = 24)
                            • 00010111 (AX = 23)
                            • Case 2
                            • XOR EAX, EAX will result on EAX = 0

                              • Logical exclusive NOT
                            • It does a logical negation on the specified operand and puts the result on the same operand. It inverses the value of a bit, bites that equal zero become 1, and vice versa.

                              • NOT 0 = 1
                              • NOT 1 = 0
                              • MOV AX, 15
                              • MOV BX, 25
                              • NOT AX gives AX = 11110000 (15 = 00001111)
                              • NOT BX gives BX = 11100110 (25 = 00011001)
                                • Logical TEST
                                • The instruction TEST does a non-destructive AND (or a logical compare), and can alter flags depending on the result of the non-destructive AND between two operands / values.

                                  If both of the corresponding bits of the concerned operands are equal to 0, each bite of the result is 0.

                                  • TEST AX, 1
                                  • If the first bit of AX is equal to 1, Zero Flag is set to 1 else Zero Flag is set to 0.
                                    • The memory and its instructions
                                      • The instruction MOVx

                                    To be able to put an offset in SI (Source Index Register), in assembly we do MOV SI, OFFSET but this is not applicable to Extra Segment, Data Segment, FS and BS registers.

                                    To move entire memory blocs, we use MOVSB, MOVSW, or MOVSD depending on the amount of bits we want to move.

                                    • MOVSB : to move one Byte (8bits)
                                    • MOVSW : to move a Word (16bits)
                                    • MOVSD : to move a Dword (double word of 32bits)
                                    • If we want to move n bits using the instruction MOVSB, we need to repeat this instruction n times, but before we need to "prepare / configure" Counter Register (CX) with how many time we want to loop. For this we use an instruction called REP.

                                      Let's suppose we want to move 1000 bits:

                                      • MOV CX, 1000 ; this configures the loop
                                      • REP MOVSB ; moves one bit
                                      • And to gain time we can move 16 bits a time:
                                      • MOV CX, 500
                                      • REP MOVSW
                                      • To gain more time we can move data by bloc of 32 bits
                                      • MOV ECX; we use the extended register CX.
                                      • REP MOVSD
                                      • This sample shows that 1000 bits are equal to 500 Words which is equal to 250 DWords

                                        • The instruction STOSx
                                        • Quite similar to MOVx, this instruction is used to store string data. It transfers the content from the registers EAX for an address size attribute of 32 bits (or AL and AH for an address size attribute of 12 bits) to the memory passing from the destination register Extra Segment (ES register). The destination operand must be ES:DI. So to put 50 bits of zeros in ES:DI we have to do:

                                          • MOV CX, 50
                                          • MOV AX, 0
                                          • REP STOSB
                                          • Conclusion

                                            You have to be aware of the fact that part one and this second part do not pretend to teach you assembly programming. This a simple introduction to the most common instructions you will face when disassembling your first programs. These are the "important" instruction to know when looking at a disassembled program in order to understand how it partially works.

                                            To see what I mean by this, have a look at the picture below:

                                            Figure 1. Disassembled program on OllyDBG

                                            OllyDBG is (very basically) a tool that shows the assembly code of a compiled program, and it can let you execute this same program step by step, which means line by line or instruction by instruction. It's a 32 bits assembler analyzing debugger designed for Microsoft Windows platforms, it can trace registers, analyze code, recognize loops and APIs calls, etc. It can be attached to running program, and can trace every stack frame… you can find more about this tool in the references links.

                                            Intro to x86 Disassembly

                                            Intro to x86 Disassembly

                                            Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

                                            1. Windows contains assembly instructions, and this is where we can go step by step when running a disassembled / debugged program, each line may alter and change data and values of Registers, memory and the stack.
                                            2. You can see the registers, flags and memory addresses.
                                            3. You can see the stack and different values that pass through it when analyzing a program.

                                            References