Reverse engineering

Reverse Engineering If Statements

Dejan Lukan
February 28, 2013 by
Dejan Lukan

Introduction

Summary: In this article, we'll present a simple program that uses 'if' statements and then we'll try to reverse engineer the compiled version of the program to figure out how we can determine the usage of an if statements in the assembly code. This will be done purely as a practice to try to determine when and where the executable uses some kind of decision making, which is usually done with the use of branches.

The If-Else Statement

First, let's take a look at a single branch and how it's compiled into low-level code. Let's start with a simple cpp code presented below:

[cpp]

#include "stdafx.h"

#include <stdio.h>

int _tmain(intargc, _TCHAR* argv[]) {

int x = 1;

if(x == 1) {

printf("Condition True!n");

}

else {

printf("Condition False!n");

}

/* wait */

getchar();

return 0;

}

[/cpp]

The code was written and compiled in Visual Studio 2010. When we compile and run the application, "Condition True!" will be displayed to us.

Let's now open the executable in Ida and find the code we've written above. The whole code can be seen on the graph below:

We won't go into details about what is being done in the function, but obviously the stack frame is being initialized first. Let's present abigger picture where we can see the actual instructions (note that only the middle part of the above graph is being displayed, because that's the part that actually implements the 'if' statement). The picture can be seen below:

The value 1 is being stored at the stack at offset [ebp+x] where the x holds the value -8, so the actual offset address is [ebp-08h]. The value 1 is then being compared to value 1, which always evaluates to true, which is why the left block is being called, which prints the "Condition True!" to the console window. But in the left block there's also an additional jmp instruction that must jump over the second block, which shouldn't be executed. A better picture that represents the same if-else condition is if we use the plain disassembly view in Ida, not the graph mode. This can be seen on the picture below, where we can see why the jmp instruction is important (if we would delete the jump instruction, the block of code displays "Condition False!").

The If-Elseif-Else Statement

In this use-case scenario we'll take a look at the if-elseif-else conditional statements to see how the high-level C++ code translates into the low-level assembly code.

Let's again present the C++ code that we used:

[cpp]

#include "stdafx.h"

#include <stdio.h>

int _tmain(intargc, _TCHAR* argv[]) {

int x = 0;

if(x < 0) {

printf("Negative number!n");

}

else if(x > 0) {

printf("Positive number!n");

}

else {

printf("Zero number!n");

}

/* wait */

getchar();

return 0;

}

[/cpp]

We can see that this time we're assigning the number 0 to the variable x and then testing whether the variable holds a negative, positive or zero number. When we compile and run the program, the message "Zero number!" is displayed, as we can see below:

Let's again load the compiled executable into Ida and see the code representing our conditional statements. The graph view of the conditional statements can be seen on the picture below:

We're storing the value 0 into [ebp+x] and comparing that to a number 0. Then we're using the jge (jump if greater or equal) jump instruction to jump to the right block if the number is greater or equal to 0 (which is true in our case, since the number is 0). The default jump block is the one that has the green arrow, while the one with the red arrow is taken if the condition is false. We can see all the colors regarding the graph view in Ida if we go to the Options – Colors – Graph, which is shown on the picture below:

In the right block, we're comparing the stored number to 0 and then using the jle(jump if less or equal) instruction to check whether the number is less or equal to 0. But in this case, the number can only be zero, because otherwise we would have jumped to the address 0x00411A7B in the previous conditional. As the number is zero, the conditional is again evaluated to true and the green arrow is being followed, which is why the "Zero number!" message is being printed in the console window. The normal disassembly code is presented below, where we can clearly see how the assembly code looks like when if-elseif-else conditional statements are being used.

On the picture above we can see that whenever we're entering a conditional block, an additional jump instruction is present that jumps over the rest of the conditional jumps (including the else statement). This is needed so that only one of the conditional blocks are executed. This can be used for identifying the first code that gets executed after the else statement. Also, the conditionals are almost always reversed. Let's see what I mean:

  • x < 0 : jge
  • x > 0 : jle
  • x == 0 : jmp

The first conditional statement in C++ code checks whether the value in variable x is lower than zero, while the assembly code uses the jge instruction to check whether the value in variable x is greater than or equal tozero, which is exactly the opposite condition. The elseif conditional in C++ code checks whether the value in variable x is greater than zero, while the assembly code uses the jle instruction to check whether the value is less or equal than 0, which is again exactly the opposite conditional. All conditional blocks also contain the jmp instructions, so we're jumping over the else statement block.

Conditional Statements with And/Or

Let's take a look at the following example that uses the && and || operators to check whether the condition is true or false.

[cpp]

#include "stdafx.h"

#include <stdio.h>

int _tmain(intargc, _TCHAR* argv[]) {

int x = 1;

if(x == 1 || x == 2 || x == 3) {

printf("Number 1, 2 or 3!n");

}

else {

printf("Not number 1, 2 or 3!");

}

/* wait */

getchar();

return 0;

}

[/cpp]

Let's load the program in Ida and present the overview graph:

We're loading the number one in [ebp+x] and then comparing it to number one, which is always true, so the execution is following the green arrow and printing the "Number 1, 2 or 3!" But if that weren't the case, we'd follow the red arrow and compare the number with 2 and then also with 3. If the evaluation is true in any case, the "Number 1, 2 or 3!" would be printed. Here we can clearly see that when the first condition of the (x == 1 || x == 2 || x == 3) is evaluated to true, the condition block is being executed without checking the other conditions; this is a valid point to remember when trying to optimize the code to its maximum. Let's also present the plain disassembly view, which presents the instructions based on their virtual address:

We can see that there are two blocks waiting to be executed (and print some message with printf), but the decision about which block will be executed is done at the beginning by comparing the variable to 1, 2 and 3. We can identify these kinds of conditionals by the comparisons being done at the beginning of the code and the fact that multiple conditional jumps point to the same code block.

Let's now use the &&operators instead of ||. The conditionals in the code below were slightly changed, so that the conditional can be evaluated to true, but the rest of the code is the same:

[cpp]

#include "stdafx.h"

#include <stdio.h>

int _tmain(intargc, _TCHAR* argv[]) {

int x = 1;

if(x == 1 &amp;&amp; x != 2 &amp;&amp; x != 3) {

printf("Number 1!n");

}

else {

printf("Not number 1!");

}

/* wait */

getchar();

return 0;

}

[/cpp]

The Ida presents the code as can be seen on the picture below:

The first comparison (cmp instruction) is checking whether the number equals to 1 and if it doesn't jumps to the loc_412FC0 (the jnz instruction).Otherwise, it goes to the next condition, which checks whether the number equals to 2 and again jumps to the loc_412FC0 if that is true. If not, it also checks whether the number is equal to 3 and jumps to the loc_412FC0 if it is. The "Number 1!" is printed only if the number equals to 1 and doesn't equal to 2 and doesn't equal to 3. The plain disassembly code view is presented below:

We can see that the code is very similar to the code when we used the || operators. The comparisons are being done at the beginning and if the conditions are met, the "Number 1!" is being called, otherwise we're jumping to the "Not number 1!" block. Again, the conditionals are reversed, the jnz is comparing whether the number is not 1, but in the original code we're checking whether the number is 1. The same is true for the other two conditionals.

Conclusion

We've seen how the 'if' conditionals are being compiled into assembly code. First, we've taken a look at the simple if-else conditionals and then more complex if-elseif-else statements. After that, we've analyzed the && and || operators. The point of this exercise was seeing how the conditionals are being translated from high-level code structures to low-level assembly code. Ida can do a great job by presenting the graph view of those statements, but we're often faced with a debugger that doesn't have the functionality of displaying the code in a graph, so the reverse engineering of conditional statements is a little harder, but not much harder. If you've carefully followed what's been written in this article, then you should be able to construct a high-level conditionals from its low-level assembly representation with ease.

References

[1] Chris Eagle, The IDA Pro Book: The unofficial guide to the world's most popular disassembler.

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.