TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection

Fuzz testing has proven successful in finding security vulnerabilities in large programs. However, traditional fuzz testing tools have a well-known common drawback: they are ineffective if most generated malformed inputs are rejected in the early stage of program running, especially when target programs employ checksum mechanisms to verify the integrity of inputs. In this paper, we present TaintScope, an automatic fuzzing system using dynamic taint analysis and symbolic execution techniques, to tackle the above problem. TaintScope has several novel contributions: 1) TaintScope is the first checksum-aware fuzzing tool to the best of our knowledge. It can identify checksum fields in input instances, accurately locate checksum-based integrity checks by using branch profiling techniques, and bypass such checks via control flow alteration. 2) TaintScope is a directed fuzzing tool working at X86 binary level (on both Linux and Window). Based on fine-grained dynamic taint tracing, TaintScope identifies which bytes in a well-formed input are used in security-sensitive operations (e.g., invoking system/library calls) and then focuses on modifying such bytes. Thus, generated inputs are more likely to trigger potential vulnerabilities. 3) TaintScope is fully automatic, from detecting checksum, directed fuzzing, to repairing crashed samples. It can fix checksum values in generated inputs using combined concrete and symbolic execution techniques. We evaluate TaintScope on a number of large real-world applications. Experimental results show that TaintScope can accurately locate the checksum checks in programs and dramatically improve the effectiveness of fuzz testing. TaintScope has already found 27 previously unknown vulnerabilities in several widely used applications, including Adobe Acrobat, Google Picasa, Microsoft Paint, and ImageMagick. Most of these severe vulnerabilities have been confirmed by Secunia and oCERT, and assigned CVE identifiers (such as CVE-2009-1882, CVE-2009-2688). Corresponding patches from vendors are released or in progress based on our reports.

[1]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[2]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[3]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[4]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[5]  Rupak Majumdar,et al.  Directed test generation using symbolic grammars , 2007, ESEC-FSE companion '07.

[6]  Thomas W. Reps,et al.  Extracting Output Formats from Executables , 2006, 2006 13th Working Conference on Reverse Engineering.

[7]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[8]  Jonathon T. Giffin,et al.  Impeding Malware Analysis Using Conditional Code Obfuscation , 2008, NDSS.

[9]  Peter Oehlert,et al.  Violating Assumptions with Fuzzing , 2005, IEEE Secur. Priv..

[10]  Tzi-cker Chiueh,et al.  A Forced Sampled Execution Approach to Kernel Rootkit Identification , 2007, RAID.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Heng Yin,et al.  Dynamic Spyware Analysis , 2007, USENIX Annual Technical Conference.

[13]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[14]  Thomas Boutell,et al.  PNG (Portable Network Graphics) Specification Version 1.0 , 1997, RFC.

[15]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[16]  Will Drewry,et al.  Flayer: Exposing Application Internals , 2007, WOOT.

[17]  Satish Narayanasamy,et al.  Automatic logging of operating system effects to guide application-level architecture simulation , 2006, SIGMETRICS '06/Performance '06.

[18]  Xiangyu Zhang,et al.  Convicting exploitable software vulnerabilities: An efficient input provenance based approach , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[19]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[20]  Dawson R. Engler,et al.  EXE: Automatically Generating Inputs of Death , 2008, TSEC.

[21]  David Brumley,et al.  Replayer: automatic protocol replay by binary analysis , 2006, CCS '06.

[22]  Giovanni Vigna,et al.  Static Detection of Vulnerabilities in x86 Executables , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[23]  Tao Wei,et al.  IntScope: Automatically Detecting Integer Overflow Vulnerability in X86 Binary Using Symbolic Execution , 2009, NDSS.

[24]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[25]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[26]  Peter Deutsch,et al.  ZLIB Compressed Data Format Specification version 3.3 , 1996, RFC.

[27]  Jeffrey C. Mogul,et al.  The VCDIFF Generic Differencing and Compression Data Format , 2002, RFC.

[28]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[29]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[30]  Min Gyung Kang,et al.  Emulating emulation-resistant malware , 2009, VMSec '09.

[31]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[32]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[33]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[34]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[35]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[36]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[37]  Kevin C. Almeroth,et al.  SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr , 2006, ISC.

[38]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[39]  Alessandro Orso,et al.  Penumbra: automatically identifying failure-relevant inputs using dynamic tainting , 2009, ISSTA.

[40]  Zhi Wang,et al.  ReFormat: Automatic Reverse Engineering of Encrypted Messages , 2009, ESORICS.

[41]  David Brumley,et al.  Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[42]  R. Sekar An Efficient Black-box Technique for Defeating Web Application Attacks , 2009, NDSS.

[43]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.