Malware analysis

Comparison of Cloud Automated Malware Analysis Tools

Dejan Lukan
February 20, 2015 by
Dejan Lukan

Cloud malware analysis services

In this section, we're providing a list of cloud automated online malware analysis tools that are not available anymore due to the website being offline or the service being disrupted by the creators of the analysis environment.

Despite quite a few analysis tools being unavailable, there are still a lot of them being actively supported and developed. The online malware analysis tools that are still present on the Internet are presented below. Each of the tools has a letter written in square brackets, which is used later on to present each of the tools in a table in order to preserve space and provide clearer results. Each of the tools also has an URL address of where the service is available in case you want to submit different files for analysis.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Note that there are other cloud malware analysis platforms, but we didn't take them info consideration in this article. Therefore, some of them are not presented and described below.

Supported file formats and document types

Since malware can be hidden in almost any file format or document type, malware analysis tools must provide support for such formats or document types in order to be able to detect the threat inside it. For example: if an attacker has hidden a malicious payload inside a PDF document, the malware analysis tool must have PDF support to be able to manipulate with PDF documents. If PDF support is not present, the dissection of PDF document will not be possible, and consequentially the tool will not be able to find malicious payload. If we look at the PDF document through the eyes of a malware analyst tool, the PDF document is just a set of random bytes.

The attackers mostly use the file formats, document types and other elements presented below for including malicious payloads. The majority of presented elements need no further introduction, since they are used in our every day lives, but we will still provide a brief explanation of each of them.

  • exe: Windows PE executable files normally used for Windows executable programs.
  • elf: Linux ELF executable files normally used for Linux executable programs.
  • mach-o: MAC OS X Mach-O executable files normally used for Mac executable programs.
  • apk: Android APK executable files
  • url: URLs
  • pdf: PDF documents
  • doc/docx: DOC/DOCX documents
  • ppt/pptx: PPT/PPTX documents
  • xsl/xsls: XSL/XSLS documents
  • htm/html: HTM/HTML web pages
  • jar: JAR Java executable files
  • rtf: RTF documents
  • dll: DLL libraries
  • db: DB database files
  • png/jpg: PNG/JPG images
  • zip/rar: ZIP/RAR archived
  • cpl: Control Panel Applets
  • ie: Analyze Internet Explorer process when opening an URL
  • ps1: Powershell scripts
  • python : Python scripts
  • vbs: VBScript files

The table below presents supported file formats and document types of each cloud automated malware analysis service. The rows represent file formats or document types, while the columns are used for each of the automated malware analysis tools presented by one or two letters (as presented before). The is used to denote that certain file format or document type is supported by an automated malware analysis service, while an empty cell indicates otherwise. The * is used to mark that the support for document type is being implemented, but not yet available, at the time of this writing.

Table 1: supported document types by different malware analysis tools

Document Type A C D E J M MS TE TT V X

exe

elf

*

mach-o

apk

url

pdf

doc/docx

ppt/pptx

xsl/xsls

rtf

htm/html

jar

dll

db

png/jpg

zip/rar

cpl

ie

ps1

python

vbs

I've spent quite some time putting together the table above, which summarized the supported file formats, document types and other kind of elements that can be analyzed in automated fashion. From the table, we can quickly determine that there isn't a service that can be used to analyze any kind of file, which is because the malicious code is included in files and documents in a profoundly different manner. When adding a malicious code in executable file, we can do so by including malicious assembly instructions in its .text file section – and that is only one of the ways of doing it. On the other hand, when including a malicious code in a .docx document, we usually include it in a form of a malicious macro, which will get executed by Microsoft Word upon opening the document.
Below we've presented different categories of categorizing the file formats, document types and other elements presented in the table above. In each of the categories we'll also briefly discuss how the malicious code gets executed and what is needed for cloud automated malware analysis of such code.

  • Executable Files [exe, elf, mach-o, apk, dll]: a malicious executable file is distributed around the Internet, which is downloaded by users in the form of cracked software programs and cracked games. The users download a program believing to be something they want, which it is, but an additional code is usually appended to the file containing a malicious payload that gets executed on the user's computer and therefore infecting it.
  • Documents [pdf, doc/docx, ppt/pptx, xsl/xsls, rtf]: vulnerabilities are discovered in different software programs on a daily basis. Therefore, if an attackers finds a vulnerability in an Acrobat Reader (supports pdf file format), Microsoft Word/OpenOffice (supports doc/docx, ppt/pptx, xsl/xslx, rtf), it can form such a document that the program won't be able to process the file, but will crash instead. Depending on the type of vulnerability, an attacker can possibly execute a malicious payload included in the document.
  • Web browser [url, htm/html, jar, ie]: web browsers also contain vulnerabilities as PDF Reader and Office Suite do. Therefore, an attacker can create a malicious website the web browser will not able to handle, which will lead to the web browser crashing, during which an attacker can execute arbitrary code.
  • Archives [zip/rar]: archives can be used to distribute malicious files around the Internet. If a malicious file is put inside a password protected archive, the usual analysis solutions won't be able to take a look inside the archive and determine whether it contains malicious files.
  • Images [png/jpg]: an attacker can hide a malicious payload inside an image, which can be processed by a vulnerable web application running on an incorrectly setup web server. Therefore, an analysis solution should be able to parse various image file formats in order to parse images to determine whether they contain anything out of the ordinary, like a malicious payload.
  • Code (python, vbs, ps1) : an attacker can also distribute malicious code written in appropriate programming/scripting language, which is later processed by some application on the victim's machine. An example of such is PowerShell (ps1) macro included in a Word document, which gets executed on a user's request when allowing the execution of macros upon opening a malicious .docx document in Microsoft Word.

Techniques for Detecting Automated Environments

Various techniques exist for detecting automated malware analysis environments, which are being incorporated in malware samples. When malware binaries are using different checks to determine whether they are executing in a controlled environment, they usually don't execute malicious actions upon environment detection.

The picture below presents an overview of malware and techniques it can use to detect if it's being executed in an automated environment. In order to make the picture clearer, we'll describe the process in detail.

Once the malware has infected the system, it can be running in user or kernel-mode, depending upon the exploitation techniques. Usually malware is running in user-mode, but there are multiple techniques for malware to gain additional privileges to execute in kernel-mode. Despite malware being executed in either user or kernel-mode, there are multiple techniques malware can use to detect if it's being executed in automated malware analysis environment. At the highest level, the techniques are divided into the following categories:

  • Detect a Debugger: debuggers are mostly used when a malware analyst is manually inspecting a malware sample in order to gain understanding of what it does. Debuggers are not frequently used in automated malware analysis, but different techniques can still be incorporated into the malware sample to make debugging the malware sample more difficult.
  • Anti-Disassembly Tricks: this category isn't directly related to automated malware analysis environments, but when an analyst is manually reviewing the malware sample in a debugger, malware can use different techniques to confuse disassembly engines into producing incorrect disassembled code. This is only useful when a malware analyst is analyzing the malware sample manually, but doesn't have much impact in automated malware analysis environments.
  • Detect a Sandbox Environment: a sandbox is an environment separate from the main operating system where malware samples can be run without causing any harm to the rest of the system. The primary purpose of sandbox environment is to emulate different parts of the system, or the whole system to separate the guest system from the host system. Depending on the virtualization layer, there are different types of sandboxes, which are presented below.

  • Virtualized Programs: Chromium Sandbox, Sandboxie
  • Linux Containers: LXC, Docker
  • Virtualized Environment: VirtualPC, VMware, VirtualBox, QEMU
  • Each automated malware analysis tool uses different backend systems to run the malware in a controlled environment. Malware can be run in physical machines or virtual machines. Note that old unused physical machines lying around at home would be a perfect candidate for setting up a malware analysis lab, which would make it considerably more difficult for malware binaries to determine whether they are being executed in a controlled environment. When building our own malware analysis lab, we have to connect multiple machines together to form a network, which can be done simply by virtual or physical switch, depending on the type of machines used.

    Each cloud automated malware analysis services uses some kind of virtualization environment to run their malware samples, like Qemu/KVM, VirtualBox, VMWare, etc. According to the virtualization technology being used, a malware sample can use different techniques to detect that it's being analyzed and terminate immediately. Thus the malware sample will not be flagged as malicious, since it terminated preemptively without execution the malicious code.

    In this section we've seen that different cloud malware analysis services use different virtualization technologies to run submitted malware samples. As far as I know, only Joe Sandbox has an option of running malware samples on actual physical machines, which prevents certain techniques from being used in malware samples to detect if they are being run in an automated malware analysis environment. Still, there are many other techniques a malware can use to detect if it's being analyzed.

    This is a cat and mouse game, where new detection techniques are invented and used by malware samples on a daily basis. On the other hand, there are numerous anti-detection techniques used to prevent the malware from determining it's being executed in an automated malware analysis environment. When a new detection technique appears, usually a new anti-detection technique is put together to render the detection technique useless.

    Conclusion

    In this article we've presented the differences between multiple cloud malware analysis services that can be used to analyze different file formats and document types. Each service supports only a fraction of all file formats and document types in which malicious code can be injected. Therefore, depending on the file we have to analyze, we can use the services that support its corresponding file format or document type.

    In order to analyze a document, we have to choose the appropriate service in order to do so. Since there are many techniques an attacker can use to determine whether the malicious payload is being executed in an automated malware analysis environment, some malicious samples won't be analyzed correctly, resulting in false positives. Therefore, such services should only be used together with a reverse engineer or malware analyst in order to manually determine whether the file is malicious or not. Since there are many malicious samples distributed around the Internet on a daily basis, every sample cannot be manually inspected, which is why cloud automated malware analysis services are a great way to speed up the analysis.

    Dejan Lukan
    Dejan Lukan

    Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.