Disassembly 101
Introduction
This article briefly explores topics connected to assembly basics, registers, operands, instructions, arithmetic instructions, logical instructions, stack instructions, conditionals and jump instructions. We’ll conclude with a reason why assembly language is still relevant despite the evolution of high-level languages.
This article has been designed for professionals, students or self-learners who want to learn the key aspects of assembly programming. This article will help to give you enough understanding on assembly programming.
Computer programming language
Computer programming language is any one of various languages used for expressing a set of detailed instructions for a digital computer. Programming languages are broadly classified into three categories: machine, assembly and high-level languages.
Machine language
Machine language is also known as machine code. It is a sequence of bit patterns that’s used for providing instructions to the processor of a computer. These sequences of binary digits are not human-readable.
Assembly language
Assembly language sends codes or instructions to the computer using simple mnemonic abbreviations. Since the codes in assembly language are not directly understood by the computer, a translator is required to convert the instructions into machine language.
The utility program that converts source code programs from assembly language into machine language, so the Central Processing Unit (CPU) can understand it, is known as an assembler. The reverse conversion of machine language into assembly language is executed by a translator called a disassembler.
High-level language
High-level languages send codes or instructions to the computer using simple English language words and mathematical symbols. These types of instructions are sometimes referred to as human languages because they are further from machine language. The translators which convert high-level language into machine language are called compilers and interpreters.
Assembly Basics
Structure of a computer system
The basic structure of a computer system is made up of the CPU, main memory and the input/output peripherals. The CPU is also made up of registers, control units and arithmetic and logic unit (ALU).
Registers
The CPU in a computer runs all of its tasks and operations. To effectively do this, it needs storage to process operations and temporarily hold the received instructions. This storage is called a register. A register may store codes or sets of instructions, a storage address of another location or any kind of data such as the binary of a character.
Instructions
Assembly language instructions come in two parts: the operational code (opcode) and the data to be operated on, the operand. A typical code in assembly language has two operands, the target operand and the source operand. The target operand is normally the address of a register, while the source operand represents a value.
Assembly code example
MOV AL, 4Dh ; load register AL with 77 decimal (4D hex)
Equivalent binary code
10110000 01001101
1011 a binary code (opcode) of instruction 'MOV'
0 specifies if data is byte (‘0’) or full size 16/32 bits
000 a binary identifier for a register 'AL'
01001101 is the binary representation of the decimal 77
Arithmetic instructions
Below are sample arithmetic instructions which assembly language performs.
- INC AL ; Increments the value in the low byte register of the primary accumulator AL register by 1
- DEC AL ; Decrements the value in the low byte register of the primary accumulator AL register by 1
- ADD AX, BX ; Add the values stored in primary accumulator AX and base register BX and then store the sum in accumulator AX
Logical instructions
Assembly language logical instructions operate on a bit-by-bit basis; therefore, no overflow or carry bit is generated. Typical logical operations include logical and (AND), logical or (OR),
logical complement (NOT) and logical exclusive or (XOR). The AND operation can be used for clearing one or more bits in a register.
Stack instructions
Assembly language stack are top-down structures in memory that store data in such a way that the last data stored is the first to be retrieved. The only access to add or remove data from the stack is through the top of the stack. The most common stack instructions are PUSH and POP. PUSH puts new data at the top of the stack while POP removes the next data from the top of the stack.
Conditionals
Assembly language conditional statements control the flow of the execution of the program. Conditional statements are in two parts: unconditional jump and conditional jump.
Unconditional jump is performed by the JMP instruction. The CMP instruction compares two operands and sets the appropriate flag, depending on the outcome. The conditional jump instructions takes input from the set flags based on the output of the CMP instruction.
Unconditional jump
- MOV AX, 10 ; Initializing AX to 2
- MOV BX, 11 ; Initializing BX to 3
- MOV CX, 00 ; Initializing CX to 0
- L17:
- ADD AX, 01 ; Increment AX
- ADD BX, AX ; Add AX to BX
- JMP L17 ; repeats the statements
- ADD BX, AX ; Add AX to BX (this line code will never run because of the unconditional jump instruction JMP L17)
Conditional jump
- CMP DX, 01 ; Compare the DX value with one
- JE L17 ; If yes, then jump to label L7 (this is a conditional jump which skips the next two instructions only if the value in register DX is one )
- ADD AX, 01 ; Increment AX
- ADD BX, AX ; Add AX to BX
- L17:
- ADD AX, 11 ; Increment AX
- ADD BX, AX ; Add AX to BX
Conclusion
The ability to read and write codes or sets of instructions in low-level assembly language is a great skill to have despite evolution of high-level languages. Assembly language codes are used in coding device drivers, real-time systems and low-level embedded systems. These codes also help in the reverse engineering processes used to establish the vulnerabilities or logical flows of computer programs in a real-world running environment.
Sources
- Computer programming language, Encyclopedia Britannica
- Difference between Machine Language and Assembly Language Comparison Chart, STechies