Improving Vulnerability Inspection Efficiency Using Active Learning

Society needs more secure software. But the subject matter experts in software security are in short supply. Hence, development teams are motivated to make the most of their limited time. The goal of this paper is to improve software vulnerability inspection efficiency via an active learning based inspection support tool HARMLESS. HARMLESS incrementally updates its vulnerability prediction model (VPM) based on latest human inspection results and then applies the model to prioritize human inspection efforts to source code files that are more likely to contain vulnerabilities. HARMLESS is designed to have three advantages over conventional software vulnerability prediction methods. Firstly, by integrating human and vulnerability prediction model in an active learning environment, HARMLESS keeps refining its VPM and can find vulnerabilities with reduced human inspection effort before a software's first release. Secondly, by estimating the total number of vulnerabilities in a software project, HARMLESS guides human to stop the inspection at a target recall. Thirdly, HARMLESS applies redundant inspection (source code files inspected multiple times by different humans) on source code files that are more likely to contain missing vulnerabilities, so that vulnerabilities missed by human inspectors can be retrieved efficiently. We evaluate HARMLESS via a simulation with Mozilla Firefox vulnerability data. Our results show that (1) HARMLESS finds 60, 70, 80, 90, 95, 99% vulnerabilities by inspecting 6, 8, 10, 16, 20, 34% source code files, respectively. (2) During the simulation, when targeting at 90, 95, 99% recall, HARMLESS could stop early at 23, 30, 47% source code files inspected, respectively. (3) Even when human reviewers fail to identify half of the vulnerabilities, HARMLESS is able to cover 96% of the missing vulnerabilities by redundantly inspecting half of the classified files.

[1]  Maura R. Grossman,et al.  TREC 2016 Total Recall Track Overview , 2016, TREC.

[2]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[3]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[4]  Trung Le,et al.  Budgeted Semi-supervised Support Vector Machine , 2016, UAI.

[5]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[6]  Lawrence J. Trautman Cybersecurity: What About U.S. Policy? , 2015 .

[7]  Laurie A. Williams,et al.  Is complexity really the enemy of software security? , 2008, QoP '08.

[8]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[9]  Carla E. Brodley,et al.  Semi-automated screening of biomedical citations for systematic reviews , 2010, BMC Bioinformatics.

[10]  Carla E. Brodley,et al.  Who Should Label What? Instance Allocation in Multiple Expert Active Learning , 2011, SDM.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Tim Menzies,et al.  FAST2: An intelligent assistant for finding relevant papers , 2017, Expert Syst. Appl..

[13]  Maura R. Grossman,et al.  Navigating Imprecision in Relevance Assessments on the Road to Total Recall: Roger and Me , 2017, SIGIR.

[14]  Maura R. Grossman,et al.  Scalability of Continuous Active Learning for Reliable High-Recall Text Classification , 2016, CIKM.

[15]  Christopher Theisen,et al.  Strengthening the Evidence that Attack Surfaces Can Be Approximated with Stack Traces , 2015 .

[16]  Oliver Kramer,et al.  Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines , 2012, ICPRAM.

[17]  Carla E. Brodley,et al.  Active Literature Discovery for Scoping Evidence Reviews How Many Needles are There , 2013 .

[18]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[19]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[20]  Chaojing Tang,et al.  Predicting buffer overflow using semi-supervised learning , 2016, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[21]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[22]  Laurie A. Williams,et al.  Teaching Secure Software Development Through an Online Course , 2017, SecSE@ESORICS.

[23]  Byron C. Wallace,et al.  Class Probability Estimates are Unreliable for Imbalanced Data (and How to Fix Them) , 2012, 2012 IEEE 12th International Conference on Data Mining.

[24]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[25]  Maura R. Grossman,et al.  The Quest for Total Recall , 2018, DocEng.

[26]  Marvin V. Zelkowitz,et al.  Understanding IV&IV in a safety critical and complex evolutionary environment: the nasa space shuttle program , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[27]  Amritanshu Agrawal,et al.  The 'BigSE' Project: Lessons Learned from Validating Industrial Text Mining , 2016, 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE).

[28]  Tim Menzies,et al.  Finding better active learners for faster literature reviews , 2016, Empirical Software Engineering.

[29]  Laurie A. Williams,et al.  Risk-Based Attack Surface Approximation: How Much Data Is Enough? , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[30]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Laurie A. Williams,et al.  Approximating Attack Surfaces with Stack Traces , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[33]  Les Hatton,et al.  Testing the Value of Checklists in Code Inspections , 2008, IEEE Software.

[34]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[35]  Robert Feldt,et al.  Validity Threats in Empirical Software Engineering Research - An Initial Survey , 2010, SEKE.

[36]  Sophia Ananiadou,et al.  Reducing systematic review workload through certainty-based screening , 2014, J. Biomed. Informatics.

[37]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[38]  Mayuram S. Krishnan,et al.  Evaluating the cost of software quality , 1998, CACM.

[39]  Wouter Joosen,et al.  Design Churn as Predictor of Vulnerabilities? , 2014, Int. J. Secur. Softw. Eng..

[40]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[41]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[42]  Viet Hung Nguyen,et al.  Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[43]  Maura R. Grossman,et al.  Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review , 2015, ArXiv.

[44]  Oliver Kramer,et al.  Fast and simple gradient-based optimization for semi-supervised support vector machines , 2014, Neurocomputing.

[45]  Laurie A. Williams,et al.  Can traditional fault prediction models be used for vulnerability prediction? , 2011, Empirical Software Engineering.

[46]  Michael Gegick,et al.  Toward Non-security Failures as a Predictor of Security Faults and Failures , 2009, ESSoS.

[47]  Tim Menzies,et al.  FAST$^2$: Better Automated Support for Finding Relevant SE Research Papers , 2017 .