SecurityXploded.com
PDF - Vulnerabilities, Exploits and Malwares | www.SecurityXploded.com
 
 
PDF - Vulnerabilities, Exploits and Malwares
Author: Dhanesh 
 
 
 
See Also
 
 
 
Contents
 
 
Introduction

Many people don't consider PDF files as a possible threat and oh, well I agree to them(!). It is not the PDF files but the rendering softwares we have to be afraid of. If you think I am referring to those Adobe Reader 0-days popping up periodically, hell yeah, you are RIGHT!. We are going to talk about PDF files, few Adobe Reader vulnerabilities, exploits and malwares that comes along with it ;)

 
 
 
Internals of PDF File
PDF files are binary files with proper formatting and looks like a collection of objects. You can open a PDF file in a text editor or hex editor to view it's object structure.
pdf malwares
As you can see PDF files start with a magic header %PDF or %%PDF followed by the spec version number.  From next line onwards you can see a pattern emerging, like [obj][data][endobj]. Well, this is the collection of object thing I said earlier. Each object is identified by an ID and a version number. 41 0 obj represents object 41 version 0. You can look into PDF specs for better understanding of the file architecture. You don't have to understand every details of the spec, but you can specifically look into streams, encodings, java script implementations, acro forms etc.

Before going further, I would like to explain a little more about streams. Streams are used to store data(images, text, java scripts etc) and to make it efficient PDF allows us to use compression and encoding techniques like Flate/LZW/RLE etc.
PDF Analysis Tools
 
Manual analysis of PDF is tricky and gets messy and using just plain text/hex editor for understanding the true content of PDF! will take you nowhere. As a programmer I can't ignore this challenge and I made a tool PDF Analyzer to solve this issue. I will use PDF Analyzer throughout this post but you won't be able to get it as it is still in private build (I will release it soon ;) ).

For now you guys have other options, both commercial and freeware tools are available. I will post some links here.
  • PDF Dissector by zynamics - commercial
  • PDF Stream Dumper by Dave - freeware
  • Various python PDF parsers from Didier Stevens and inREVERSE guys - freeware (search!)
 
PDF Analyzer is made in C# with only 3 external libraries, zlib (I should have used GZipStream with 2 byte header hack),  BeaEngine (Thanks BeatriX) and JSBeautifier (I ported 95% of code from js to C#). I spent around 2 weeks of free time on it. It may not be the fastest PDF parser, but it can handle every ill formatted PDF I have in my repository ;).
 
 
Analyzing Real PDF Malwares
 
Adobe reader's top vulnerabilities come from Adobe specific javascript APIs. This gives us a chance to disable javascript and protect us from any of those javascript based exploits. Disabling javascript is crucial but it doesn't fix vulnerabilities from other parts of Adobe Reader such as embedded image files and flash files.

Now we will look into some of the malware samples which exploits these vulnerabilities. You can find malware sample from many security blogs and I must thank two of my friends who sent a big archive of malware PDFs for analysis and testing :) .
 
 
This particular sample splits javascript into three streams and concatenates them using <</Names[(1)6 0 R (2)7 0 R (3)8 0 R]>> which will eventually refer to three objects marked in red. After beautification, it seems it is exploiting one vulnerability existed in  Adobe Reader namely this.media.newPlayer(null).
 
 

It is essentially spraying heap with NOP sled and shellcode and calling the vulnerable function. The shellcode present here is a dropper/downloader, you can dump it to a file and use IDA to disassemble it.

 

Another PDF file which exploits util.printf is given below.

 
 
Again you can dump shellcode and disassemble with IDA. Another option is to use PDF Analyzers unescape functionality to directly disassemble the shell code.
 
 

Disassembly starts with pretty straight forward steps to find base address via delta calculation(call - pop - sub). Then it fetches kernel32 base from PEB(fs[0x30])->Ldr.InInitOrder[0].base_address. This will be used to eventually load other modules and APIs.

Malware writers use multiple techniques to protect their payload. Techniques involves obfuscation, multiple and multi-level usage of encoding/compression schemes.

 
 

If any of you guys have samples that uses multi-level encoding, please send them to me ;) , I would like to test those with PDF Analyzer.

I will conclude the exploit samples by posting the latest exploit for the vulnerability printSeps. This code is taken from the PDF posted in full disclosure list.

 
 
 
 
Conclusion
 
Evil actions of PDF malwares varies from regular password stealer to rootkits. Once you have attained arbitrary code execution, rest will be just imagination of malware writer. As malware writers are mainly targeting Adobe Reader, try to shift to other PDF rendering software or at least update to latest version. There are free PDF readers like Sumatra or GhostScript, try those out and always be cautious when opening a PDF file !
 
 
 
See Also