Analysis of Malicious Documents- Part 5

Security Ninja
November 7, 2017 by
Security Ninja

In the last article, we reviewed how to analyze malicious PDF documents. In this last part of the article series, we will continue analyzing PDF documents with other tools. In this article, review the Origami framework which can be used to inspect and extract various objects from PDF documents.

As a refresher let's reiterate essential keywords concerning PDF document analysis.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

  1. /AA: This defines the Automatic Actions that is embedded in the document when the user opens the document. It should be noted that events an also declared inside this like cursor movement to trigger a particular action.
  2. /AcroForm: This shows whether Adobe forms are used in PDF documents or not.
  3. /ObjStm: This is used to define object stream which can hide specific objects. We will see this in the later part of the series.
  4. /JS: Embedded JavaScript within the document.
  5. /GoTo*: Redirected to the specified destination in the PDF file.
  6. /URI: Resource accessed as pointed by URL
  7. /SubmitForm and /GoToR: This indicates the data send to the URL.
  8. /Launch: This launches a program.

Let's start using different utilities inside the Origami framework.

First, let's see PDF Walker, which is a GUI program included a part of Origami framework.

As we can see that the PDFwalker has extracted all the embedded objects from the PDF. Now we must search for a JavaScript object, so first let's consider the references of JavaScript.

This search will give us the reference of object 32.

To view this object, click on Document > Jump to Object and type the object number like below.

This will show us the Object 32 stream.

It must be noted that PDF Walker identifies the encoded algorithm used in the PDF document and applies necessary decoding. For this document, PDFwalker identifies FlateDecode and applies the necessary filter

Above we can see the decoded stream. We can dump this stream by right-clicking the stream and dump it.

Origami also includes a command line tool PDFextract which automatically locates, decodes and extracts JavaScript code. It must be noted that PDFextract can also extract embedded images and file attachment. To instruct the tool to extract only JavaScript, we must supply this with -j parameter.

Moreover, it will create a direct <filename>. dump >script and will dump the extracted script inside it. Below is an example of extracted JavaScript from the sample.pdf.7

Now let's explore another sample with both these tools.

Launching the sample inside PDFwalker.

And now search for JavaScript as is done earlier. It will give reference to Object 10. Now, jump to Object 10.

As we can see that it references Object 12, so let's jump to object 12. It points to Object 13. Continuing the same process, let's jump to Object 13.

Now the stream can be decoded and then analyzed further.

Let's analyze the same PDF using pdf-extract. This time we will extract everything in the sample PDF and not just JavaScript like below

Below we can see that the pdf-extract tool has extracted 2 pdf streams, 2 scripts from the sample pdf file and dump it to mentioned locations.

After this, we can use SpiderMonkey to deobfuscate the script located in the sample.dump/scripts folder. Using spider monkey will show us the extraction of JS into eval 1 and eval 2 and after looking at the contents of eval.002.log, it contains the deobfuscated JS.

As discussed earlier, now also we can see that the exploit is targeting the Collab.CollectEmailInfo vulnerability. Please note the use of NOP sled in the different variables. Now to analyze further we need to copy the shellcode in variable brIW1yTY and convert it into an executable, we will do it using shellcode2exe.

Since there are %u, so we need to convert the Unicode to hex first like below

And now let's convert this into exe using the shellcode2exe.py like below

And it successfully converts the shellcode to exe binary

This exe can be analyzed further, for example, a quick search for 'HTTP' in binary reveals


So, this is all for PDF analysis using these tools. There are other tools as well such as PDF Stream Dumper, Peeppdf, AnalyzePDF which can also be used to analyze malicious PDF.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

As you have seen in this article and last article, there is a procedure which analysts must follow to identify properly, locate, extract and de-obfuscate and further analyze embedded scripts in such malicious documents.

Security Ninja
Security Ninja