Virtual Machine Introspection in Malware Analysis
What is Virtual Machine Introspection?
The word "introspection" generally refers to the observation and examination of one's own mental and emotional state. It is considered as the act of looking at oneself. However, the introspection of virtual machines is the art of monitoring virtual machines' activities from the hypervisor and accessing them without being inside. No agent is installed on the virtual machine; everything is done from the outside.
Why Virtual Machine Introspection?
One of the main reasons for using introspection in malware detection is that malwares using advanced techniques such as Rootkits are not detected using traditional automated malware-detection systems. The other reason is the advanced features that this technology provides, which allows the user to have a deep insight into each action happening in the virtual machine.
Become a certified reverse engineer!
Thanks to introspection technology, there is no need to be part of the malware environment to analyze it: the behavior of the processes to be monitored will be realized outside the virtual machine, from the hypervisor. In addition to that, malwares that use debugger detection techniques will not detect any debugger, because the introspection system only interacts with the virtual machine's memory and in no way will attach to the processes launched on the machine. Similarly, it is possible to mislead a malware employing sandbox detection techniques.
As a result, it can't be denied that the application of introspection to malware analysis is much better than traditional automated-analysis technologies.
Let's discuss some aspects of virtual machine introspection.
Architecture of Introspection
Types of Hypervisors
We generally consider two types of hypervisors:
- Native hypervisor: Also known as Type-1 hypervisor or "Bare Metal." This type of hypervisor works directly on the hardware of the machine to control and manage virtual machines. Examples of Type-1 hypervisors include Xen and KVM.
- Hosted hypervisor: Also known as Type-2 hypervisor or "Hosted." This type of hypervisor runs inside another operating system. Examples of Type-2 hypervisors include VMware and VirtualBox.
Fig1: Example of a native and hosted hypervisor
The following figure illustrates how introspection works on a type-1 hypervisor. The introspection tools installed on the hypervisor make it possible to monitor and control everything that happens on the virtual machine (system activity, network, application and so forth).
Fig2. Introspection in native hypervisor
Memory Mapping
In general, there are two levels of memory: virtual memory and physical memory of the physical machine. But there are three levels of memory when we talk about hypervisors: virtual memory and physical memory of the virtual machine, and physical memory of the host machine. The virtual memory of the host machine is abstracted in this case.
Keep in mind that hypervisors only allocate memory to the virtual machine. By default, hypervisors have no knowledge of what is happening in the virtual memory of the virtual machine. To get that information, additional tools have to be installed. Below is a simplified example of memory-sharing with the virtual machine.
Fig3. Example of the three levels of memory addressing under hypervisor.
One of the objectives of the introspection tools is to translate the memory addresses of the virtual machine's virtual memory: first, from the virtual to the physical memory of the virtual machine, then to the physical memory of the host machine. This will help the hypervisor to access the correct memory area during introspection.
Introspection in Malware Analysis
Nowadays, traditional malware-analysis technologies are no longer effective when it comes to detecting the latest malware attacks. But thanks to hypervisors, malware monitoring can be placed at Kernel-level. It also offers several ways to create compatibility with security components that block monitoring techniques like Kernel API Hooking (Patch Guard, for example, will trigger a Blue Screen of Death under Windows) if low-level components have been altered. In addition to that, introspection can assist the user in covering the analysis of all types of malware, including Rootkits.
Thanks to the memory allocated by the hypervisor to the different virtual machines, it is possible for us to access it from the hypervisor in order to get the information which interests us. This can be done in different ways:
Monitoring via Breakpoints
The idea is to place breakpoints (often used in reverse-engineering) on the APIs that we want to monitor in the virtual machine during the initialization of the introspection system. We do this while maintaining a lookup table of breakpoint offsets and modified bytes. After the installation and launch of the introspection system, the malware calling the APIs on which we have placed breakpoints will be interrupted, and a notification will be sent back to the hypervisor allowing us to find which executable has reached the breakpoint.
Once the desired information (the function, its arguments) are retrieved, we continue the execution of the malware (in order to not change its behavior) until the next breakpoint. Thanks to this information, we can understand the behavior of the analyzed malware.
The use of this technique alone has two weaknesses. First, in the cases where the malware realizes a checksum on itself to detect if its code has been altered. If this is the case (since breakpoints modify the content), the malware may stop executing and not reveal its malicious behavior. Second, in the cases where we want to monitor Kernel APIs: it is possible to cause a BSOD under Windows, since Windows performs an integrity check of the Kernel part and may crash if it has been modified.
Monitoring via Memory Pages
With this technique, there is no need to place breakpoints and thus alter the memory. The idea would be to place a monitoring with memory events on each executable memory page, and to calculate the address of the APIs that one wishes to monitor. Whenever a region of the page marked in execution is accessed, the information is retrieved, and a comparison is made with the address of the monitored API.
This technique solves the problem of altering the memory with breakpoints but creates another. Monitoring on each executable page would be expensive in terms of resources and cause a considerable slowdown of the virtual machine. Do not forget that the virtual machine will not have a lot of resources to work with, especially if we want to automate the analysis on multiple virtual machines in parallel.