Mining Bug Databases for Unidentified Software Vulnerabilities

Identifying software vulnerabilities is becoming more important as critical and sensitive systems increasingly rely on complex software systems. It has been suggested in previous work that some bugs are only identified as vulnerabilities long after the bug has been made public. These vulnerabilities are known as hidden impact vulnerabilities. This paper discusses existing bug data mining classifiers and present an analysis of vulnerability databases showing the necessity to mine common publicly available bug databases for hidden impact vulnerabilities. We present a vulnerability analysis from January 2006 to April 2011 for two well known software packages: Linux kernel and MySQL. We show that 32% (Linux) and 62% (MySQL) of vulnerabilities discovered in this time period were hidden impact vulnerabilities. We also show that the percentage of hidden impact vulnerabilities has increased from 25% to 36% in Linux and from 59% to 65% in MySQL in the last two years. We then propose a hidden impact vulnerability identification methodology based on text mining classifier for bug databases. Finally, we discuss potential challenges faced by a development team when using such a classifier.

[1]  Laurie A. Williams,et al.  One Technique is Not Enough: A Comparison of Vulnerability Discovery Techniques , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[2]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[5]  Swapna S. Gokhale,et al.  Linux Bugs: Life Cycle and Resolution Analysis , 2008, 2008 The Eighth International Conference on Quality Software.

[6]  Swapna S. Gokhale,et al.  Linux bugs: Life cycle, resolution and architectural analysis , 2009, Inf. Softw. Technol..

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Markus Schumacher,et al.  Data Mining in Vulnerability Databases , 2000 .

[9]  Jeremy S. Bradbury,et al.  How Good is Static Analysis at Finding Concurrency Bugs? , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[10]  Richard P. Lippmann,et al.  Using a Diagnostic Corpus of C Programs to Evaluate Buffer Overflow Detection by Static Analysis Tools , 2005 .

[11]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[12]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[13]  Gail E. Kaiser,et al.  BUGMINER: Software Reliability Analysis Via Data Mining of Bug Reports , 2011, SEKE.

[14]  Richard Lippmann,et al.  Testing static analysis tools using exploitable buffer overflows from open source code , 2004, SIGSOFT '04/FSE-12.

[15]  Daniel M. Germán,et al.  Towards a simplification of the bug report form in eclipse , 2008, MSR '08.

[16]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[17]  Mohammad Zulkernine,et al.  Classification of Static Analysis-Based Buffer Overflow Detectors , 2010, 2010 Fourth International Conference on Secure Software Integration and Reliability Improvement Companion.

[18]  Peng Li,et al.  A comparative study on software vulnerability static analysis techniques and tools , 2010, 2010 IEEE International Conference on Information Theory and Information Security.

[19]  Geoffrey Thomas,et al.  Security Impact Ratings Considered Harmful , 2009, HotOS.

[20]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[21]  John Noll,et al.  A Qualitative Study of Open Source Software Development: The Open EMR Project , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[22]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[23]  Hareton K. N. Leung,et al.  Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[24]  Bojan Cukic,et al.  Detecting bug duplicate reports through local references , 2011, Promise '11.

[25]  Wei Ming Khoo Hunting for vulnerabilities in large software : the OpenOffice suite , 2010 .

[26]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[27]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[28]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[29]  Luigi Carro,et al.  An evaluation of free/open source static analysis tools applied to embedded software , 2010, 2010 11th Latin American Test Workshop.

[30]  Brad A. Myers,et al.  A Linguistic Analysis of How People Describe Software Problems , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).