Toward Smarter Vulnerability Discovery Using Machine Learning

A Cyber Reasoning System (CRS) is designed to automatically find and exploit software vulnerabilities in complex software. To be effective, CRSs integrate multiple vulnerability detection tools (VDTs), such as symbolic executors and fuzzers. Determining which VDTs can best find bugs in a large set of target programs, and how to optimally configure those VDTs, remains an open and challenging problem. Current solutions are based on heuristics created by security analysts that rely on experience, intuition and luck. In this paper, we present Central Exploit Organizer (CEO), a proof-of-concept tool to optimize VDT selection. CEO uses machine learning to optimize the selection and configuration of the most suitable vulnerability detection tool. We show that CEO can predict the relative effectiveness of a given vulnerability detection tool, configuration, and initial input. The estimation accuracy presents an improvement between $11%$ and $21%$ over random selection. We are releasing CEO and our dataset as open source to encourage further research.

[1]  Yang Liu,et al.  Steelix: program-state based binary fuzzing , 2017, ESEC/SIGSOFT FSE.

[2]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[3]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[4]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[5]  Mohammed Atiquzzaman,et al.  Behavioral malware detection approaches for Android , 2016, 2016 IEEE International Conference on Communications (ICC).

[6]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[7]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[8]  Sanjay Rawat,et al.  Finding Buffer Overflow Inducing Loops in Binary Executables , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  George Candea,et al.  The S2E Platform: Design, Implementation, and Applications , 2012, TOCS.

[12]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[13]  Pablo Buiras,et al.  QuickFuzz: an automatic random fuzzer for common file formats , 2016, Haskell.

[14]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Diane Duros Hosfelt Automated detection and classification of cryptographic algorithms in binary programs through machine learning , 2015, ArXiv.

[17]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Guillermo L. Grinblat,et al.  Toward Large-Scale Vulnerability Discovery using Machine Learning , 2016, CODASPY.

[20]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[21]  Christopher Krügel,et al.  Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance , 2017, CCS.

[22]  David Brumley,et al.  Enhancing symbolic execution with veritesting , 2014, ICSE.

[23]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[24]  Zachary N. J. Peterson,et al.  Analysis of Mutation and Generation-Based Fuzzing , 2007 .