EnFuzz: From Ensemble Learning to Ensemble Fuzzing

Fuzzing is widely used for software vulnerability detection. There are various kinds of fuzzers with different fuzzing strategies, and most of them perform well on their targets. However, in industry practice and empirical study, the performance and generalization ability of those well-designed fuzzing strategies are challenged by the complexity and diversity of real-world applications. In this paper, inspired by the idea of ensemble learning, we first propose an ensemble fuzzing approach EnFuzz, that integrates multiple fuzzing strategies to obtain better performance and generalization ability than that of any constituent fuzzer alone. First, we define the diversity of the base fuzzers and choose those most recent and well-designed fuzzers as base fuzzers. Then, EnFuzz ensembles those base fuzzers with seed synchronization and result integration mechanisms. For evaluation, we implement EnFuzz , a prototype basing on four strong open-source fuzzers (AFL, AFLFast, AFLGo, FairFuzz), and test them on Google's fuzzing test suite, which consists of widely used real-world applications. The 24-hour experiment indicates that, with the same resources usage, these four base fuzzers perform variously on different applications, while EnFuzz shows better generalization ability and always outperforms others in terms of path coverage, branch coverage and crash discovery. Even compared with the best cases of AFL, AFLFast, AFLGo and FairFuzz, EnFuzz discovers 26.8%, 117%, 38.8% and 39.5% more unique crashes, executes 9.16%, 39.2%, 19.9% and 20.0% more paths and covers 5.96%, 12.0%, 21.4% and 11.1% more branches respectively.

[1]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[2]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[3]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[4]  Lawrence L. Kupper,et al.  Probability, statistics, and decision for civil engineers , 1970 .

[5]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[6]  Meng Xu,et al.  QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing , 2018, USENIX Security Symposium.

[7]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[8]  Angelos D. Keromytis,et al.  SlowFuzz: Automated Domain-Independent Detection of Algorithmic Complexity Vulnerabilities , 2017, CCS.

[9]  Emin Gün Sirer,et al.  Using production grammars in software testing , 1999, DSL '99.

[10]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[11]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[12]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[13]  Herbert Bos,et al.  IFuzzer: An Evolutionary Interpreter Fuzzer Using Genetic Programming , 2016, ESORICS.

[14]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2017, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[16]  Yu Jiang,et al.  SAFL: Increasing and Accelerating Testing Coverage with Symbolic Execution and Guided Fuzzing , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[17]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[18]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[19]  Jia-Guang Sun,et al.  PAFL: extend fuzzing optimizations of single mode to industrial parallel mode , 2018, ESEC/SIGSOFT FSE.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Jaime G. Carbonell,et al.  Machine learning research , 1981, SGAR.

[22]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[23]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[24]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[25]  Alexander Pretschner,et al.  Improving function coverage with munch: a hybrid fuzzing and directed symbolic execution approach , 2017, SAC.

[26]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[27]  Wen Xu,et al.  Designing New Operating Primitives to Improve Fuzzing Performance , 2017, CCS.

[28]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[29]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[30]  Mingzhe Wang,et al.  Fuzz testing in practice: Obstacles and solutions , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[31]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[32]  David Brumley,et al.  Program-Adaptive Mutational Fuzzing , 2015, 2015 IEEE Symposium on Security and Privacy.

[33]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[34]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[37]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[38]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.