Analyzing malicious PDFs

November 20, 2013 by

PDF files have become very common in everyday work. It's hard to imagine business proposals without PDFs. The PDF format is used in almost all companies to share business deals, company brochures, and even invitations.

Previous years were not good for PDF users, as several vulnerabilities were published, such as buffer overflow vulnerability in versions prior to version 9. A lot of the attacks were observed trying to abuse the bug by using social engineering or by hosting malicious PDF files on the Internet. Just the simple act of opening the PDF file could exploit a vulnerability to automatically download malicious code from the internet, and display a decoy PDF file to trick you into believing that nothing wrong has happened.

Malicious PDF files are frequently used as part of targeted and mass-scale computer attacks for these reasons. It's good to know how to analyze PDF files, but analysts first need a basic understanding of a PDF before they deem it malicious. Here is the information you'll need to know.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

PDF structure

The PDF has ability to deliver rich contents (static and dynamic) . Combined, these elements can deliver a visually appealing, interactive, and portable document. While we have all benefited from this feature-rich information-sharing venue, there exists a darker side. The dynamic PDF capabilities mentioned above can and have been used to house malicious content. In previous years, cybercriminals embedded malicious script to install malware and steal user credentials.

Normally, the PDF malware's malicious behavior is in a script that is embedded In PDF files. The scripts that are responsible for malicious behavior can be written in a scripting language that PDF supports. JavaScript is the most popular for this purpose. In most cases, the embedded scripts are responsible for dropper functionality, or else there is a need to install an OS-based malware on the victim's system.

PDF document structure

The general structure of a PDF file is composed of the following code components:

  • Boolean values, representing true or false
  • Numbers
  • Strings
  • Names
  • Arrays, ordered collections of objects
  • Dictionaries, collections of objects indexed by names
  • Streams, usually containing large amounts of data
  • The null object

PDF Document Structure

Execute malware with PDF

A launch action launches an application or opens or prints a document. We can use one of the many Adobe Acrobat exploits in the Metasploit framework to embed an exe with PDF.

cmd will be opened by using the above launch action.

We can embed the malware in our PDF by using JavaScript because JavaScript commonly uses heap spray to exploit.

When we open any malicious PDF file, it will execute the JavaScript and it exploits the JavaScript; after that, the shell code is processed and a Trojan will be executed from the Internet.

Create a malicious PDF file with Metasploit

We are going to be using the Adobe Reader "util.printf()" JavaScript function stack buffer overflow vulnerability to create a malicious PDF file. Adobe Reader is prone to a stack-based buffer-overflow vulnerability.

The steps for creating our malicious PDF file are as follows:

Open msfconsole and execute the following command.

Once we have all the options set the way we want, we run "exploit" to create our malicious file.

We can see that our PDF file was created. You can access this PDF by using the given path. If no files are visible, press CTRL and H together to find hidden files and folders.

Before we send the malicious file to our victim, we need to set up a listener to capture this reverse connection.

As the victim opens the malicious file, the session with the victim has been established and we can access the victim's system by using meterpreter.

Methods of analyzing malicious PDF file

Analyzing a PDF file involves examining, decoding, and extracting the contents of suspicious PDF objects that may be used to exploit a vulnerability in Adobe Reader and execute a malicious payload. There is an increasing number of tools that are designed to assist with this process. Mainly, analysis of PDF can be done in two ways: online and offline.

Online PDF analyzers

Every time you feel a file is suspicious or you receive a file from an untrusted source, it's recommended that you scan it with one of the following online services before you open it. Online PDF analyzers makes our work easier. We just have to submit the malicious PDF file and the online analyzer starts scanning the uploaded PDF for several known exploits.


Wepawet is a service for detecting and analyzing web-based malware. It currently handles Flash, JavaScript, and PDF files. To use Wepawet, just go to http://wepawet.iseclab.org. Upload a sample or specify a URL and the resource will be analyzed and a report will be generated.

PDF Examiner

PDF Examiner by Malware Tracker is able to scan the uploaded PDF for several known exploits and it allows the user to explore the structure of the file, as well as examining, decoding, and dumping PDF object contents. This tool lends itself well to manual PDF analysis tasks. Go to www.malwaretracker.com. Press the tab of +Pdf examiner scan tab and select the PDF to scan.


It is designed for automatically examining and deobfuscating JavaScript. Its features also include carving contents of network packet capture (PCAP) files and identifying common client-side exploits. It can also examine PDF files for malicious JavaScript artifacts. Go to jsunpack.jeek.org.

Offline PDF analyzer

If we have to examine the PDF manually, then the following are the tools to be used.

PDF stream dumper

After installing PDF Stream Dumper, load the suspicious PDF file and start looking around. You can see different colors on the right side, where the red color shows headers with JavaScript tags.

The tool includes a number of signatures of known PDF exploits. To scan the file, select "Exploits Scan" from the menu by pressing. Our malicious PDF shows that it contains an exploit of CVE- 2008-2992.

We are now reasonably certain that we're dealing with a malicious PDF file that exploits the CVE-2008-2992 vulnerability in Acrobat Reader to extract and run a malicious executable embedded in the PDF.

We can get the more information about the exploit by visiting cvedetails.com and entering the CVE number of the exploit.


Peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is provide all the necessary components that a security researcher could need in a PDF analysis without using three or four tools to perform all the tasks.

Here maliciouspdf.pdf is the PDF file we want to analyze.

If there are some objects with Java script code in PDF content, we can use the JS commands to analyze them. We can clearly see from the above screenshot that object No. 5 is JavaScript, for which PyV8 is required.

After analyzing the JavaScript code of the object, we can get the URL of the exe which is bound to the PDF.


Origami is a Ruby framework designed to parse, analyze, and forge PDF documents. It can be used to create on-the-fly customized PDFs, or to inject (evil) code into already existing documents.


Pdfid is a utility that can extract useful information from a PDF file. Specifically, pdfid extracts header information from the PDF such as Java Script, obj, endobj, stream and other information. Some PDF exploits alter this information, so pdfid can sometimes show the user what exactly is going on inside of the PDF.


  • Enable automatic updates.
  • Disable PDF browser integration.
  • Always install the latest patch/update, even for older Adobe product versions.
  • Disable JavaScript.
  • Uncheck "Allow non-PDF file attachments with external applications" to prevent launch action vulnerability.
  • Use PDF alternatives such as Foxit, Sumatra, PDF XChange.


Several vulnerabilities were found in previous years and it keeps increasing day by day, so it's important to analyze any PDF before opening it because just the simple act of opening the PDF file could exploit a vulnerability to automatically download malicious code from the Internet. We can analyze any PDF by using online approach or offline approach. For the protection we have to use the alternative PDF reader and always install the update or patch of PDF reader.

Become a Certified Ethical Hacker, guaranteed!

Become a Certified Ethical Hacker, guaranteed!

Get training from anywhere to earn your Certified Ethical Hacker (CEH) Certification — backed with an Exam Pass Guarantee.




Warlock works as a Information Security Professional. He has quite a few global certifications to his name such as CEH, CHFI, OSCP and ISO 27001 Lead Implementer. He has experience in penetration testing, social engineering, password cracking and malware obfuscation. He is also involved with various organizations to help them in strengthening the security of their applications and infrastructure.