论文信息 - Improving Fitness Function for Language Fuzzing with PCFG Model

Improving Fitness Function for Language Fuzzing with PCFG Model

In this paper, we propose to use machine learning techniques to model the vagueness of bugs for language interpreters and develop a fitness function for the language fuzzing based on genetic programming. The basic idea is that bug-triggering scripts usually contain uncommon usages which are not likely used by programmers in daily developments. We capture the uncommonness by using the probabilistic context-free grammar model and the Markov model to compute the probabilities of scripts such that bug-triggering scripts will get lower probabilities and higher fitness values. We choose the ROC (Receiver Operating Characteristic) curves to evaluate the performance of fitness functions in identifying bug-triggering scripts from normal scripts. We use a large corpus of JavaScript scripts from Github and POC test cases of bug-reports from SpiderMonkey's bugzilla for evaluations. The ROC curves from the experiments show that our method can provide better ability to rank the bug triggering scripts in the top-K elements.

[1] Michael Pradel,et al. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data , 2016 .

[2] Xuejun Yang,et al. Finding and understanding bugs in C compilers , 2011, PLDI '11.

[3] Roy P. Pargas,et al. Test‐data generation using genetic algorithms , 1999 .

[4] Andreas Krause,et al. Learning programs from noisy data , 2016, POPL.

[5] Satish Narayanasamy,et al. Using web corpus statistics for program analysis , 2014, OOPSLA.

[6] Gordon Fraser,et al. Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[7] Mark Johnson,et al. PCFG Models of Linguistic Tree Representations , 1998, CL.

[8] Richard J. Enbody,et al. Revolutionizing the Field of Grey-box Attack Surface Testing with Evolutionary Fuzzing , 2007 .

[9] Swarat Chaudhuri,et al. Bayesian specification learning for finding API usage errors , 2017, ESEC/SIGSOFT FSE.

[10] Zhendong Su,et al. On the naturalness of software , 2012, ICSE 2012.

[11] Herbert Bos,et al. VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[12] Yang Liu,et al. Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[13] Premkumar T. Devanbu,et al. On the localness of software , 2014, SIGSOFT FSE.

[14] Rishabh Singh,et al. Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15] Premkumar T. Devanbu,et al. Delft University of Technology On the “ Naturalness ” of Buggy Code , 2017 .

[16] Basel A. Mahafzah,et al. A multiple-population genetic algorithm for branch coverage test data generation , 2011, Software Quality Journal.

[17] Herbert Bos,et al. IFuzzer: An Evolutionary Interpreter Fuzzer Using Genetic Programming , 2016, ESORICS.

[18] Devin Chollak,et al. Bugram: Bug detection with n-gram language models , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19] Andreas Zeller,et al. Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[20] Paolo Tonella,et al. Combining Stochastic Grammars and Genetic Programming for Coverage Testing at the System Level , 2014, SSBSE.