Security Bug Report Usage for Software Vulnerability Research: A Systematic Mapping Study

Context: Security bug reports are reports from bug tracking systems that include descriptions and resolutions of security vulnerabilities that occur in software projects. Researchers use security bug reports to conduct research related to software vulnerabilities. A mapping study of publications that use security bug reports can inform researchers on (i) the research topics that have been investigated, and (ii) potential research avenues in the field of software vulnerabilities. Objective: The objective of this paper is to help researchers identify research gaps related to software vulnerabilities by conducting a systematic mapping study of research publications that use security bug reports. Method: We perform a systematic mapping study of research that use security bug reports for software vulnerability research by searching five scholar databases: (i) IEEE Xplore, (ii) ACM Digital Library, (iii) ScienceDirect, (iv)Wiley Online Library, and (v) Springer Link. From the five scholar databases, we select 46 publications that use security bug reports by systematically applying inclusion and exclusion criteria. Using qualitative analysis, we identify research topics investigated in our collected set of publications. Results: We identify three research topics that are investigated in our set of 46 publications. The three topics are: (i) vulnerability classification; (ii) vulnerability report summarization; and (iii) vulnerability dataset construction. Of the studied 46 publications, 42 publications focus on vulnerability classification. Conclusion: Findings from our mapping study can be leveraged to identify research opportunities in the domains of software vulnerability classification and automated vulnerability repair techniques.

[1]  Sufyan bin Uzayr GitHub , 2022, Mastering Git.

[2]  Xiaoyin Wang,et al.  SAIS: Self-Adaptive Identification of Security Bug Reports , 2019 .

[3]  Tim Menzies,et al.  Improving Vulnerability Inspection Efficiency Using Active Learning , 2018, IEEE Transactions on Software Engineering.

[4]  Yitao Yang,et al.  Collective transfer learning for defect prediction , 2020, Neurocomputing.

[5]  Xiaohong Su,et al.  LTRWES: A new framework for security bug report detection , 2020, Inf. Softw. Technol..

[6]  M. J. Kami,et al.  Customers , 2020, Management GOLF.

[7]  Abhishek Sharma,et al.  A Machine Learning Approach for Vulnerability Curation , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[8]  Yan Wu,et al.  Classifying Software Vulnerabilities by Using the Bugs Framework , 2020, 2020 8th International Symposium on Digital Forensics and Security (ISDFS).

[9]  Patrick Kwaku Kudjo,et al.  An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection , 2020, J. Syst. Softw..

[10]  Roberto Camacho Barranco,et al.  A vulnerability analysis and prediction framework , 2020, Comput. Secur..

[11]  Christopher Theisen,et al.  Better together: Comparing vulnerability prediction models , 2020, Inf. Softw. Technol..

[12]  Dejun Mu,et al.  CVE-assisted large-scale security bug report dataset construction method , 2020, J. Syst. Softw..

[13]  Patrick Kwaku Kudjo,et al.  The effect of Bellwether analysis on software vulnerability severity prediction models , 2020, Software Quality Journal.

[14]  Gerardo Canfora,et al.  Summarizing vulnerabilities' descriptions to support experts during vulnerability assessment activities , 2019, J. Syst. Softw..

[15]  Mehdi Mirakhorli,et al.  Automated Characterization of Software Vulnerabilities , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[16]  Barry W. Boehm,et al.  Costing Secure Software Development: A Systematic Mapping Study , 2019, ARES.

[17]  Alok Kumar,et al.  Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data , 2019, 2019 IEEE International Conference on Smart Computing (SMARTCOMP).

[18]  Bashar Nuseibeh,et al.  Text Filtering and Ranking for Security Bug Report Prediction , 2019, IEEE Transactions on Software Engineering.

[19]  Tim Menzies,et al.  Better Security Bug Report Classification via Hyperparameter Optimization , 2019, ArXiv.

[20]  Jiadong Ren,et al.  Automatic Classification Method for Software Vulnerability Based on Deep Neural Network , 2019, IEEE Access.

[21]  Yitao Yang,et al.  Multiview Transfer Learning for Software Defect Prediction , 2019, IEEE Access.

[22]  Laurie A. Williams,et al.  Where Are The Gaps? A Systematic Mapping Study of Infrastructure as Code Research , 2018, Inf. Softw. Technol..

[23]  Yves Le Traon,et al.  An Empirical Study on Vulnerability Prediction of Open-Source Software Releases , 2019 .

[24]  Limin Sun,et al.  Understanding and Securing Device Vulnerabilities through Automated Bug Report Analysis , 2019, USENIX Security Symposium.

[25]  Shang Gao,et al.  Smart contract applications within blockchain technology: A systematic mapping study , 2018, Telematics Informatics.

[26]  Li Li,et al.  Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[27]  Laurie A. Williams,et al.  Mapping the field of software life cycle security metrics , 2018, Inf. Softw. Technol..

[28]  Hai Jin,et al.  Automatically Identifying Security Bug Reports via Multitype Features Analysis , 2018, ACISP.

[29]  Katerina Goseva-Popstojanova,et al.  Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[30]  Daniela Micucci,et al.  Automatic Software Repair: A Survey , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[31]  Andreia Malucelli,et al.  Requirements engineering: A systematic mapping study in agile software development , 2018, J. Syst. Softw..

[32]  Jeffrey C. Carver,et al.  Guidelines for Systematic Mapping Studies in Security Engineering , 2018, ArXiv.

[33]  Ashley Dara Dotz At the Edges: Vulnerability, Prediction, and Resilience , 2018 .

[34]  Ben Y. Zhao,et al.  With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning , 2018, USENIX Security Symposium.

[35]  Robert K. Yin,et al.  Case Study Research and Applications: Design and Methods , 2017 .

[36]  Katerina Goseva-Popstojanova,et al.  Experience Report: Security Vulnerability Profiles of Mission Critical Software: Empirical Analysis of Security Related Bug Reports , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[37]  Christoph Meinel,et al.  Automatic Vulnerability Classification Using Machine Learning , 2017, CRiSIS.

[38]  Zhenchang Xing,et al.  Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[39]  Maher Alharby,et al.  Blockchain-based Smart Contracts: A Systematic Mapping Study , 2017, ICAISC 2017.

[40]  Yaqin Zhou,et al.  Automated identification of security issues from commit messages and bug reports , 2017, ESEC/SIGSOFT FSE.

[41]  M. McLaughlin,et al.  Textual analysis of security bug reports , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[42]  Sajjad Mahmood,et al.  Exploring software security approaches in software development lifecycle: A systematic mapping study , 2017, Comput. Stand. Interfaces.

[43]  H T Waaler,et al.  Bayes' Theorem , 2017, Encyclopedia of Machine Learning and Data Mining.

[44]  Maya Daneva,et al.  On the pragmatic design of literature studies in software engineering: an experience-based guideline , 2016, Empirical Software Engineering.

[45]  Nour Ali,et al.  A Systematic Mapping Study in Microservice Architecture , 2016, 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA).

[46]  Harold Booth,et al.  Vulnerability Description Ontology (VDO): a Framework for Characterizing Vulnerabilities , 2016 .

[47]  John Grundy,et al.  A systematic mapping study of mobile application testing techniques , 2016, J. Syst. Softw..

[48]  Kief Morris,et al.  Infrastructure as Code: Managing Servers in the Cloud , 2016 .

[49]  Claus Pahl,et al.  Microservices: A Systematic Mapping Study , 2016, CLOSER.

[50]  A Survey-Vulnerability Classification of Bug Reports using Multiple Machine Learning Approach , 2016 .

[51]  David Lo,et al.  Combining Software Metrics and Text Features for Vulnerable File Prediction , 2015, 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS).

[52]  Lionel C. Briand,et al.  Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning , 2015, IEEE Transactions on Dependable and Secure Computing.

[53]  Kai Petersen,et al.  Guidelines for conducting systematic mapping studies in software engineering: An update , 2015, Inf. Softw. Technol..

[54]  Laurie A. Williams,et al.  Challenges with applying vulnerability prediction models , 2015, HotSoS.

[55]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[56]  Milos Manic,et al.  Vulnerability identification and classification via text mining bug databases , 2014, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society.

[57]  Alexander Serebrenik,et al.  Security and emotion: sentiment analysis of security discussions on GitHub , 2014, MSR 2014.

[58]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[59]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[60]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[61]  Per Runeson,et al.  Trends in the Quality of Human-Centric Software Engineering Experiments--A Quasi-Experiment , 2013, IEEE Transactions on Software Engineering.

[62]  Chen Liu,et al.  R2Fix: Automatically Generating Bug Fixes from Bug Reports , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[63]  Chris Arney Network Analysis: Methodological Foundations , 2012 .

[64]  Sunita Beniwal,et al.  Classification and Feature Selection Techniques in Data Mining , 2012 .

[65]  Laurie A. Williams,et al.  Can traditional fault prediction models be used for vulnerability prediction? , 2011, Empirical Software Engineering.

[66]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[67]  Doina Caragea,et al.  An Empirical Study on Using the National Vulnerability Database to Predict Software Vulnerabilities , 2011, DEXA.

[68]  Pearl Brereton,et al.  Using mapping studies as the basis for further research - A participant-observer case study , 2011, Inf. Softw. Technol..

[69]  Muhammad Ali Babar,et al.  Identifying relevant studies in software engineering , 2011, Inf. Softw. Technol..

[70]  Michael W. Godfrey,et al.  Automated topic naming to support cross-project analysis of software maintenance activities , 2011, MSR '11.

[71]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[72]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[73]  Ying Zhou,et al.  Research on the architecture of vulnerability discovery technology , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[74]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[75]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[76]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[77]  Laurie A. Williams,et al.  Secure open source collaboration: an empirical study of linus' law , 2009, CCS.

[78]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[79]  Raghu Ramakrishnan,et al.  Bellwether analysis: Searching for cost-effective query-defined predictors in large databases , 2009, TKDD.

[80]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[81]  Amanda H. Goodall Highly Cited Leaders and the Performance of Research Universities. , 2008 .

[82]  S. Cutter,et al.  Temporal and spatial changes in social vulnerability to natural hazards , 2008, Proceedings of the National Academy of Sciences.

[83]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[84]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[85]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[86]  Indrajit Ray,et al.  Measuring, analyzing and predicting security vulnerabilities in software systems , 2007, Comput. Secur..

[87]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[88]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[89]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[90]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[91]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[92]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[93]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[94]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[95]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[96]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[97]  P. Duberstein,et al.  Psychological vulnerability to completed suicide: a review of empirical studies. , 2001, Suicide & life-threatening behavior.

[98]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[99]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[100]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[101]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[102]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[103]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[104]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[105]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[106]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .