Assisting Vulnerability Detection by Prioritizing Crashes with Incremental Learning

The proliferation of Internet of Things (IoT) devices is accompanied by the tremendous increase of the attack surface of the networked embedded systems. Software vulnerabilities in these systems become easier than ever to be exploited by cybercriminals. Although fuzz testing is an effective technique to detect memory corruption induced vulnerabilities, it requires in-depth analysis of the typically massive crashes, which impedes the in-time identification and patching of potentially disastrous vulnerabilities. In this paper, we present a new approach that can efficiently classify crashes based on their exploitability, which facilitates the human analysts to prioritize the crashes to be examined and hence accelerate the discovery of vulnerabilities. A compact fingerprint for the dynamic execution trace of each crashing input is firstly generated based on n-gram analysis and feature hashing. The fingerprints are then fed to an online classifier to build the distinguishing model. The incremental learning enabled by the online classifier makes the built model scale well even for a large amount of crashes and at the same time easy to be updated for new crashes. Experiments on 4,392 exploitable crashes and 33,934 non-exploitable crashes show that our method can achieve an F1-score of 95% in detecting the exploitable crashes and significantly better accuracy than the popular crash classification tool !exploitable.

[1]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[2]  Guillermo L. Grinblat,et al.  Toward Large-Scale Vulnerability Discovery using Machine Learning , 2016, CODASPY.

[3]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[4]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[5]  Guanhua Yan,et al.  ExploitMeter: Combining Fuzzing with Machine Learning for Automated Evaluation of Software Exploitability , 2017, 2017 IEEE Symposium on Privacy-Aware Computing (PAC).

[6]  David Brumley,et al.  Automatic exploit generation , 2014, CACM.

[7]  Vrizlynn L. L. Thing,et al.  A hybrid symbolic execution assisted fuzzing method , 2017, TENCON 2017 - 2017 IEEE Region 10 Conference.

[8]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[9]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[10]  Stephen McCamant,et al.  Crash analysis with BitBlaze , 2010 .

[11]  Chip-Hong Chang,et al.  Hardware IP Watermarking and Fingerprinting , 2016 .

[12]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[13]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[14]  Shih-Kun Huang,et al.  Software Crash Analysis for Automatic Exploit Generation on Binary Programs , 2014, IEEE Transactions on Reliability.

[15]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[16]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[19]  Chengyu Song,et al.  Preventing exploits against memory corruption vulnerabilities , 2016 .

[20]  Hamid Reza Shahriari,et al.  Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques , 2017, ACM Comput. Surv..

[21]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[22]  Sooyong Park,et al.  Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts , 2011, IEEE Transactions on Software Engineering.

[23]  Fang Wu,et al.  Vulnerability detection with deep learning , 2017, 2017 3rd IEEE International Conference on Computer and Communications (ICCC).

[24]  Li Zhang,et al.  A survey of Android exploits in the wild , 2018, Comput. Secur..

[25]  Sanjay Rawat,et al.  Exniffer: Learning to Prioritize Crashes by Assessing the Exploitability from Memory Dump , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).