Java bytecode reverse engineering
This article is designed to show how to crack a Java executable by disassembling the corresponding bytes code. Disassembling Java bytecode is the act of transforming Java bytecode to Java source code. Disassembling is an inherent issue in the software industry, causing revenue loss due to software piracy. Security engineers have made an effort to resist disassembling techniques, including software watermarking, code obfuscation, in the context of Java bytecode disassembling. A huge allotment of this paper is dedicated to tactics that are commonly considered to be reverse engineering.
The methods presented here, however, are intended for professional software developers and each technique is based on a custom-created application. We are not encouraging any kind of malicious hacking approach by presenting this article; in fact the contents of this paper help to pinpoint the vulnerability in the source code and learn the various methods developers can use in order to shield their intellectual property from reverse engineering. We shall explain the process of disassembling in terms of obtaining sensitive information from source code and cracking a Java executable without having the original source code.
Become a certified reverse engineer!
Prerequisite
I presume that the aspirant would have thorough understanding of programming, debugging and compiling in JAVA on various platforms such as Linux and Windows and, of course, knowledge of JVM's inner workings. Apart from that, the following tools are required to manipulate bytecode reverse engineering;
- JDK Toolkit (Javac, javap)
- Eclipse
- JVM
- JAD
Java bytecode
Engineers usually construct software in a high-level language such as Java, which is comprehensible to them but which in fact, cannot be executed by the machine directly. Such a textual form of a computer program, known as source code, is converted into a form that the computer can directly execute. Java source code is compiled into an intermediate language known as Java bytecode, which is not directly executed by the CPU but rather, is executed by a Java virtual machine (JVM). Compilation is typically the act of transforming a high-level language into a low-level language such as machine code or bytecode. We do not need to understand Java bytecode, but doing so can assist debugging and can improve performance and memory convention.
The JVM is essentially a simple stack-based machine that can be separated into a couple of segments; for instance, stack, heap, registers, method area, and native method stacks. An advantage of the virtual machine architecture is portability: Any machine that implements the Java virtual machine specification is able to execute Java bytecode in a manner of "Write once, run anywhere." Java bytecode is not strictly linked to the Java language and there are many compilers, and other tools, available that produce Java bytecode, such as the Eclipse IDE, Netbeans, and the Jasmin bytecode assembler. Another advantage of the Java virtual machine is the runtime type safety of programs. The Java virtual machine defines the required behavior of a Java virtual machine but does not specify any implementation details. Therefore the implementation of the Java virtual machine specification can be designed in different ways for diverse platforms as long as it adheres to the specification.
Sample cracked application
The subsequent Java console application "LoginTest" is developed in order to explain Java bytecode disassembling. This application typically tests valid users by passing them through a simple login user name and password mechanism. We have got this application from other resources as an unregistered user and obviously we don't possess the source code of this application. As a result, we do not know a valid user name and password, which are only provided to the registered user.
Without having the source code of the application or login credential sets, we still can manage to login into this mechanism, by disassembling its bytecode where we can expose sensitive information related to user login.
Disassemble bytecode
Disassembling is the reverse approach, due to the standard and well-documented structure of bytecode, which is an act of transforming a low-level language into a high-level language. It basically generates the source code from Java bytecode. We typically run a disassembler to obtain the source code for the given bytecode, just as running a compiler yields bytecode from the source code. Disassembling is utilized to ascertain the implementation logic despite the absence of the relevant documentation and the source code, which is why vendors explicitly prohibit disassembling and reverse engineering in the license agreement. Here are some of the reasons to decompile:
- Fixing critical bugs in the software for which no source code exists.
- Troubleshooting a software or jar that does not have proper documentation.
- Recovering the source code that was accidentally lost.
- Learning the implementation of a mechanism.
- Learning to protect your code from reverse engineering.
The process of disassembling Java bytecode is quite simple, not as complex as native c/c++ binary. The first step is to compile the Java source code file, which has the *.java extension through javac utility that produce a *.class file from the original source code in which bytecode typically resides. Finally, by using javap, which is a built-n utility of the JDK toolkit, we can disassemble the bytecode from the corresponding *.class file. The javap utility stores its output in *.bc file.
Opening a *.class file does not mean that we access the entire implementation logic of a mechanism. If we try to open the generated bytecode file through notepad or any editor after compiling the Java source code file using javac utility, we surprisingly find some bizarre or strange data in the class file which are totally incomprehensible. The following figure displays the .class files data:
So the idea of opening the class file directly isn't at all successful, hence we shall use WinHex editor to disassemble the bytecode, which will produce the implementation logic in hexadecimal bytes, along with the strings that are manipulated in the application. Although we can reverse engineer or reveal sensitive information of a Java application using WinHex editor, this operation is sophisticated because unless we have the knowledge to match the hex byte reference to the corresponding instructions in the source code we can't obtain much information.
Reversing bytecode
It is relatively easy to disassemble the bytecode of a Java application, compared to other binaries. The javap in-built utility that ships with the JDK toolkit plays a significant role in disassembling Java bytecode, as well as helping to reveal sensitive information. It typically accepts a *.class file as an argument, as following:
[java]
Drive:> Javap LoginTest
[/java]
Once this command is executed, it shows the real source code behind the class file; but remember one thing: It does display only the methods signature used in the source code, as follows:
[java]
Compiled from “LoginTest.java”
public class LoginTest
{
public LoginTest();
public static void main(java.lang.String[]);
static boolean verify(java.lang.String, char[]);
}
[/java]
The entire source code of the Java executable, even if it contains methods related to opcodes, would be showcased by the javap –c switch, as following:
[java]
Drive:> Javap –c LoginTest
[/java]
This command dumps the entire bytecode of the program in the form of a special opcode instruction. The meaning of each instruction in the context of this program will be explained in a later section of this paper. I have highlighted the important section, from which we can obtain critical information.
[java]
Compiled from "LoginTest.java"
public class LoginTest {
public LoginTest();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: invokestatic #2 // Method java/lang/System.console:()Ljava/io/Console;
3: astore_1
4: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
7: ldc #4 // String Login Verification
9: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
12: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
15: ldc #6 // String ************************
17: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
20: aload_1
21: ldc #7 // String Enter username:
23: iconst_0
24: anewarray #8 // class java/lang/Object
27: invokevirtual #9 // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;
30: pop
31: aload_1
32: invokevirtual #10 // Method java/io/Console.readLine:()Ljava/lang/String;
35: astore_2
36: aload_1
37: ldc #11 // String Enter password:
39: iconst_0
40: anewarray #8 // class java/lang/Object
43: invokevirtual #9 // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;
46: pop
47: aload_1
48: invokevirtual #12 // Method java/io/Console.readPassword:()[C
51: astore_3
52: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
55: ldc #13 // String -------------------------
57: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
60: aload_2
61: aload_3
62: invokestatic #14 // Method verify:(Ljava/lang/String;[C)Z
65: ifeq 79
68: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
71: ldc #15 // String Status::Login Succesfull
73: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
76: goto 87
79: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
82: ldc #16 // String Status::Login Failed
84: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
87: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
90: ldc #13 // String -------------------------
92: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
95: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
98: ldc #17 // String !!!Thank you!!!
100: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
103: return
…
}
[/java]
From line 62, we can easily conclude that the login mechanism is implemented using a method called verify that typically checks either the user-entered username and password. If the user enters the correct password, then the "Login success" message flashes, otherwise:
[java]
62: invokestatic #14 // Method verify:(Ljava/lang/String;[C)Z
65: ifeq 79
68: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
71: ldc #15 // String Status::Login Succesfull
73: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
76: goto 87
79: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
82: ldc #16 // String Status::Login Failed
[/java]
But still we are unable to grab the username and password information. But, if we analyze the verify methods instruction, we can easily find that the username and password are hard-coded in the code itself, highlighted in the colored box as following:
[java]
static boolean verify(java.lang.String, char[]);
Code:
0: new #18 // class java/lang/String
3: dup
4: aload_1
5: invokespecial #19 // Method java/lang/String."<init>":([C)V
8: astore_2
9: aload_0
10: ldc #20 // String ajay
12: invokevirtual #21 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
15: ifeq 29
18: aload_2
19: ldc #22 // String test
21: invokevirtual #21 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
24: ifeq 29
27: iconst_1
28: ireturn
29: iconst_0
30: ireturn
}
[/java]
We finally come to the conclusion that this program accepts ajay as the username and test as the password, which is mentioned in the ldc instruction.
Now launch the application once again and enter the aforesaid credentials. Bingo!!!! We have successfully subverted the login authentication mechanism without even having the source code:
Bytecode instruction specification
Like Assembly programming, Java machine code representation is done via bytecode opcodes, which are the forms of instruction that the JVM executes on any platform. Java bytecodes typically offer 256 diverse mnemonic and each is one byte in length. Java bytecodes instructions fall into these major categories:
- Load and store
- Method invocation and return
- Control transfer
- Arithmetical operation
- Type conversion
- Object manipulation
- Operand stack management
We shall only discuss the opcode instructions that are used in the previous Java binary. The following table illustrates the usage meanings as well as the corresponding hex value:
19
2a
2b
2c
bd
3a
4b
4c
4d
59
B2
A7
B7
B8
B6
99
03
04
ac
12
57
B1
In brief
This paper illustrates the mechanism of disassembling Java bytecode in order to reveal sensitive information when you do not have the source of the Java binary. We have come to an understanding of how to implement such reverse engineering tactics by using JDK utilities. This article also unfolds the importance of bytecode disassembling and JVM internal workings in the context of reverse bytecode and it also explains the meaning of essential bytecode opcode in detail. Finally, we have seen how to subvert login authentication on a live Java console application by applying disassembly tactics. In the forthcoming paper, we shall explain how to patch Java bytecode in the context of revere engineering.
Become a certified reverse engineer!