Analysis of Malicious Documents 2
In the previous part of this article series, we have seen some basics and initial concepts around analyzing Malicious files. In this part of article series, we will consider more complex examples and will see the methods to parse the contents of those malicious documents. In the previous article, we have seen examples where VBA code is used to download some binaries from the internet. Let's now see how a VBA code can be used to extract a binary embedded into the binary itself.
Before we begin with new analysis, let's go back a little to the last article where we had identified the vbaProject.bin file from a malicious document. Let's analyze that vbaProject.bin a bit more.
If you remember in the last article, we have seen another URL pointing to Softonic site to download binary, but above we saw that the URL is pointing to a Dropbox URL. Why? Let's see more on this. We will load the vbaProject.bin file to SSView tool. This tool will help us to view all the text embedded inside streams. As we can see that the dropbox URL is embedded inside the _SRP_0 stream. However, the question is still unanswered that what is this URL doing in the stream anyway. Well, it turns out that SRP streams are available in the Office documents when there is VBA macro code embedded in it and stores the compiled form of VBA macro codes in them. This reveals the previous version of macros used in the document and is very useful for analysts to connect the dots, but analysts should also be cognizant of the fact that a malicious author can easily add a fake SRP stream inside the document to confuse the analyst.
Note: I analyzed the other streams but did not find anything useful.
Now let's see how we can also analyze the documents which only download the code but also perform the extraction of embedded code and execute it.
Below is an example of analyzing a malicious document using another tool known as olevba.py was written by Philippe Lagadec which is used to automatically parse the contents of Microsoft documents (both binary OLE format and new XML documents), extracts and display the embedded macros. This script also analyses various portions of the document and shows the risky portions out of it.
For example, running olevba.py on the sample document will produce the following output.
Let's run the olevba.py on the previous known malicious file.
Running the below command will produce the following output
As we can see that the malicious code
-
Function is called in Auto_Open which will execute as soon as the User will open the document and enable the macros
-
Traverse the paragraphs in document
-
Extracts the characters from paragraphs that represent the hexadecimal.
-
Convert the extracted characters from point 2 to raw binary form.
-
Save the file in the %USERPROFILE% folder.
We can also see the contents of the embedded file by just unzipping the contents
and opening them in notepad and then parses them through a script and save the parsed content as an executable to analyze it as a normal malicious executable.
Now let's take a look at some more examples for OLE SS format since the majority of attacks on office documents are focused on old binary format ones rather new XML based. Structured storage considers a single file as a collection of two types of objects: storage and streams. Microsoft office of different kinds have different storage and stream, for example, ppt format stores a "PowerPoint Document," xls stores inside a "workbook" stream and .doc uses various streams.
In the below section, we will look at a sample malicious ppt file. Let's load this file inside OffVis which is a Microsoft free tool to look at the raw contents of a document, parses it and can tell an analyst whether the document is malicious or not. For example, loading the sample.ppt file into OffVis reveals that it has an exploit which is targeted for CVE-2006-0022. Note that this tool functionality to check exploit code for a CVE is limited to check only for the CVE's embedded into the code.
We can also use OfficeMalScanner scan command to look out for known signature patterns inside the document. Scan option of OfficeMalScanner looks out for
- OLE data using signature: xDOxCFx11xE0xA1xB1x1axE1
- PE signature: MZ at offset 00.
- Function Prologues.
- Looks for hashes comparison.
- Known API calls like UrlDownlodToFile, LoadLibrary, GetProcAddress, etc
- FS[30],FS[0] references
- Call next/POP, JMP []/CALL/POP patterns.
One of the limitations of using scan parameter alone is that it can only work for unencrypted contents. If the document has shellcode embedded in encrypted format, then we can use the 'scan brute' parameter together like
and the output will be like the following:
OfficeMalScanner tries all the combination of the 1-byte key that can be used to encrypt the content of the embedded shellcode using common obfuscation algorithms like XOR, ADD, ROL.
We can also use the 'scan debug' feature of OfficeMalScanner to see the disassembled code found at above locations like
and below are the patterns for CALL POP patterns which gives us an indication of where shellcode might reside.
So, in this part of the article we have seen some more options of OfficeMalScanner such as debug, scan, and brute, which are handy features; learned about structures, streams, and so forth. In the next part of this series, we will also take a look at how we can locate and extract the shellcode from these documents. Also, we will start the analysis of PDF documents as well.