TIFF: Using Input Type Inference To Improve Fuzzing

Developers commonly use fuzzing techniques to hunt down all manner of memory corruption vulnerabilities during the testing phase. Irrespective of the fuzzer, input mutation plays a central role in providing adequate code coverage, as well as in triggering bugs. However, each class of memory corruption bugs requires a different trigger condition. While the goal of a fuzzer is to find bugs, most existing fuzzers merely approximate this goal by targeting their mutation strategies toward maximizing code coverage. In this work, we present a new mutation strategy that maximizes the likelihood of triggering memory-corruption bugs by generating fewer, but better inputs. In particular, our strategy achieves bug-directed mutation by inferring the type of the input bytes. To do so, it tags each offset of the input with a basic type (e.g., 32-bit integer, string, array etc.), while deriving mutation rules for specific classes of bugs. We infer types by means of in-memory data-structure identification and dynamic taint analysis, and implement our novel mutation strategy in a fully functional fuzzer which we call TIFF (Type Inference-based Fuzzing Framework). Our evaluation on real-world applications shows that type-based fuzzing triggers bugs much earlier than existing solutions, while maintaining high code coverage. For example, on several real-world applications and libraries (e.g., poppler, mpg123 etc.), we find real bugs (with known CVEs) in almost half of the time and upto an order of magnitude fewer inputs than state-of-the-art fuzzers.

[1]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[2]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[3]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[4]  Marcello Pagano,et al.  Principles of Biostatistics , 1992 .

[5]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[6]  Herbert Bos,et al.  Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations , 2013, USENIX Security Symposium.

[7]  David Walker,et al.  The next 700 data description languages , 2006, POPL '06.

[8]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[9]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[10]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[11]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[13]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[14]  Herbert Bos,et al.  Body Armor for Binaries: Preventing Buffer Overflows Without Recompilation , 2012, USENIX Annual Technical Conference.

[15]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[16]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[17]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[18]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[19]  Shan Lu,et al.  Leveraging the short-term memory of hardware to diagnose production-run software failures , 2014, ASPLOS.

[20]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[21]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[22]  Herbert Bos,et al.  IFuzzer: An Evolutionary Interpreter Fuzzer Using Genetic Programming , 2016, ESORICS.

[23]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[24]  Taddeus Kroes,et al.  JTR: A Binary Solution for Switch-Case Recovery , 2017, ESSoS.

[25]  David Brumley,et al.  Program-Adaptive Mutational Fuzzing , 2015, 2015 IEEE Symposium on Security and Privacy.

[26]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[27]  Xiangyu Zhang,et al.  Reverse Engineering Input Syntactic Structure from Program Execution and Its Applications , 2010, IEEE Transactions on Software Engineering.

[28]  Guofei Gu,et al.  TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[29]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[30]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[31]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[32]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[33]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[34]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[35]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[36]  Bogdan Korel,et al.  Automated Software Test Data Generation , 1990, IEEE Trans. Software Eng..