Towards Generalizing Defect Prediction Models

Software quality is vital to the success of a software project. Fixing defects is the major activity to continuously improve software quality. Given that a real development team usually exhibits limited resources and tight schedules, it is important to prioritize testing activities and optimize development resources. Predicting defective entities (e.g., files or classes) ahead helps achieve such a goal. Defect prediction has attracted considerable attention from both academia and industry in the last decade. A typical defect prediction model is built upon software metrics and labelled defect data that are collected from the historical data of a software project. A defect prediction model can be applied within the same project (within-project defect prediction) or on other projects (cross-project defect prediction). However, due to the diversity in development processes, a defect prediction model is often not transferable and requires to be rebuilt when the target project changes. As it consumes additional effort to build and maintain a defect prediction model for a particular project, it is of significant interest to generalize a defect prediction model. A generalized defect prediction model relieves the need to rebuild a defect prediction model for each target project. Moreover, it helps reveal a general relationship between software metrics and defect data. In this thesis, we analyze the feasibility of generalizing defect prediction models. First, we analyze how the distribution of the values of software metrics varies across projects of different context factors (e.g., programming language and system size). We observe that such distributions do vary across projects, but can also be similar across projects of different context factors. Second, we investigate the impact that the pre-processing steps (in particular, transformation and aggregation of software metrics) have on the performance of defect prediction models. We find that the pre-processing steps impact the performance of defect prediction models, and therefore need to be considered towards generalizing defect prediction models. Finally, we propose two approaches for generalizing defect prediction models with supervised (requiring the training data) and unsupervised (without the training data) methods, respectively. Our results show that both approaches are feasible to generalize defect prediction models.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.

[3]  Markus Lumpe,et al.  On the Application of Inequality Indices in Comparative Software Analysis , 2013, 2013 22nd Australian Software Engineering Conference.

[4]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[5]  Emad Shihab,et al.  Practical Software Quality Prediction , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[6]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[7]  Christian Bird,et al.  Diversity in software engineering research , 2013, ESEC/FSE 2013.

[8]  Jan Mendling,et al.  A study of the effectiveness of two threshold definition techniques , 2012, EASE.

[9]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[10]  Rasmus Bro,et al.  Data Pre-processing , 2009 .

[11]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[12]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[13]  Giuliano Antoniol,et al.  Threats on building models from CVS and Bugzilla repositories: the Mozilla case study , 2007, CASCON.

[14]  Ying Zou,et al.  Migration to object oriented platforms: a state transformation approach , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[15]  Alexander Serebrenik,et al.  Theil index for aggregation of software metrics values , 2010, 2010 IEEE International Conference on Software Maintenance.

[16]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[19]  Chin-Yu Huang,et al.  Evaluation and Application of Bounded Generalized Pareto Analysis to Fault Distributions in Open Source Software , 2014, IEEE Transactions on Reliability.

[20]  Lars Lundberg,et al.  Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study , 2007, J. Syst. Softw..

[21]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[22]  D. Sheskin Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition , 2000 .

[23]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[24]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[25]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[26]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[27]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[28]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[29]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[30]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[31]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[32]  W. E. Silver,et al.  Economics and Information Theory , 1967 .

[33]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[34]  Barbara A. Kitchenham,et al.  An evaluation of some design metrics , 1990, Softw. Eng. J..

[35]  Daniel M. Germán,et al.  On the Distribution of Source Code File Sizes , 2011, ICSOFT.

[36]  Giuliano Antoniol,et al.  A Feedback Based Quality Assessment to Support Open Source Software Evolution: the GRASS Case Study , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[37]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[38]  Martin Pinzger,et al.  Using the gini coefficient for bug prediction in eclipse , 2011, IWPSE-EVOL '11.

[39]  Filomena Ferrucci,et al.  A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[40]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[41]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[42]  Qian Yin,et al.  Software quality prediction using Affinity Propagation algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[43]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[44]  Roberto da Silva Bigonha,et al.  Identifying thresholds for object-oriented software metrics , 2012, J. Syst. Softw..

[45]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[46]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[47]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .

[48]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[49]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[50]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[51]  Ayse Basar Bener,et al.  Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry , 2010, Inf. Softw. Technol..

[52]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[53]  Andreas Zeller,et al.  Predicting defects in SAP Java code: An experience report , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[54]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[55]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[56]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2015, IEEE Transactions on Software Engineering.

[57]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets , 2015, ASE 2015.

[58]  Serge-Christophe Kolm,et al.  Unequal inequalities. I , 1976 .

[59]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[60]  Maurizio Morisio,et al.  Characteristics of open source projects , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[61]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[62]  Howard Wainer,et al.  A Handbook for Data Analysis in the Behavioral Sciences: Statistical Issues , 1993 .

[63]  Jens Grabowski,et al.  Calculation and optimization of thresholds for sets of software metrics , 2011, Empirical Software Engineering.

[64]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[65]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[66]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[67]  Yuming Zhou,et al.  Predicting object-oriented software maintainability using multivariate adaptive regression splines , 2007, J. Syst. Softw..

[68]  Swapna S. Gokhale,et al.  Software defect rediscoveries: a discrete lognormal model , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[69]  Richard H. Carver,et al.  An Evaluation of the MOOD Set of Object-Oriented Software Metrics , 1998, IEEE Trans. Software Eng..

[70]  Per Runeson,et al.  A Second Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[71]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[72]  Mark Lorenz,et al.  Object-oriented software metrics - a practical guide , 1994 .

[73]  John E. Gaffney,et al.  Estimating the Number of Faults in Code , 1984, IEEE Transactions on Software Engineering.

[74]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[75]  Rakesh Rana,et al.  The Adoption of Machine Learning Techniques for Software Defect Prediction: An Initial Industrial Validation , 2014, JCKBSE.

[76]  Peter Christen,et al.  Data Pre-Processing , 2012 .

[77]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[78]  Tibor Gyimóthy,et al.  A probabilistic software quality model , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[79]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[80]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[81]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[82]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[83]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[84]  Ayse Basar Bener,et al.  A defect prediction method for software versioning , 2008, Software Quality Journal.

[85]  Ayse Basar Bener,et al.  Software Defect Identification Using Machine Learning Techniques , 2006, 32nd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO'06).

[86]  Taghi M. Khoshgoftaar,et al.  Software quality analysis by combining multiple projects and learners , 2008, Software Quality Journal.

[87]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[88]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[89]  H. Dalton The Measurement of the Inequality of Incomes , 1920 .

[90]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[91]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[92]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[93]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[94]  Koichiro Ochimizu,et al.  Towards logistic regression models for predicting fault-prone code across software projects , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[95]  Lionel C. Briand,et al.  Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[96]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[97]  Barry W. Boehm,et al.  What we have learned about fighting defects , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[98]  Doo-Hwan Bae,et al.  An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[99]  Witold Pedrycz,et al.  Identification of defect-prone classes in telecommunication software systems using design metrics , 2006, Inf. Sci..

[100]  Alexander Serebrenik,et al.  By no means: a study on aggregating software metrics , 2011, WETSoM '11.

[101]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[102]  Norman E. Fenton,et al.  Software metrics: roadmap , 2000, ICSE '00.

[103]  Michele Marchesi,et al.  On the Distribution of Bugs in the Eclipse System , 2011, IEEE Transactions on Software Engineering.

[104]  Dilip Kumar Yadav,et al.  A fuzzy logic based approach for phase-wise software defects prediction using software metrics , 2015, Inf. Softw. Technol..

[105]  Rodney X. Sturdivant,et al.  Interpretation of the Fitted Logistic Regression Model , 2005 .

[106]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[107]  Per Runeson,et al.  A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[108]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[109]  J. Osborne Improving your data transformations: Applying the Box-Cox transformation , 2010 .

[110]  Joost Visser,et al.  Standardized code quality benchmarking for improving software maintainability , 2011, Software Quality Journal.

[111]  D. Spinellis,et al.  Chapter 1 Using Object-Oriented Design Metrics to Predict Software Defects , 2010 .

[112]  Per Runeson,et al.  Experience from replicating empirical studies on prediction models , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[113]  Susan Elliott Sim,et al.  Using benchmarking to advance research: a challenge to software engineering , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[114]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[115]  R. Yin Case Study Research: Design and Methods , 1984 .

[116]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .

[117]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[118]  P. Blanchard,et al.  Mathematical Analysis of Urban Spatial Networks , 2008 .

[119]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[120]  K. Goseva-Popstojanova,et al.  Common Trends in Software Fault and Failure Data , 2009, IEEE Transactions on Software Engineering.

[121]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.

[122]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[123]  Ying Zou,et al.  Studying the Impact of Clones on Software Defects , 2010, 2010 17th Working Conference on Reverse Engineering.

[124]  Scott Dick,et al.  Evaluating Stratification Alternatives to Improve Software Defect Prediction , 2012, IEEE Transactions on Reliability.

[125]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[126]  Wei Guo A Unified Approach to Data Transformation and Outlier Detection using Penalized Assessment , 2014 .

[127]  Ahmed E. Hassan,et al.  Studying the impact of dependency network measures on software quality , 2010, 2010 IEEE International Conference on Software Maintenance.

[128]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[129]  Florin Gorunescu,et al.  Data Mining - Concepts, Models and Techniques , 2011, Intelligent Systems Reference Library.

[130]  F. Bourguignon On the Measurement of Inequality , 2003 .

[131]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[132]  Oscar Nierstrasz,et al.  Comparative analysis of evolving software systems using the Gini coefficient , 2009, 2009 IEEE International Conference on Software Maintenance.

[133]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[134]  Lucas Batista Leite de Souza,et al.  Do software categories impact coupling metrics? , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[135]  Gregory Tassey,et al.  Prepared for what , 2007 .

[136]  Yue Jiang,et al.  Can data transformation help in the detection of fault-prone modules? , 2008, DEFECTS '08.

[137]  Ali Selamat,et al.  Fault prediction by utilizing self-organizing Map and Threshold , 2013, 2013 IEEE International Conference on Control System, Computing and Engineering.

[138]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[139]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[140]  Khaled El Emam,et al.  Thresholds for object-oriented measures , 2000, Proceedings 11th International Symposium on Software Reliability Engineering. ISSRE 2000.

[141]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[142]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[143]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[144]  Brent Hailpern,et al.  Software debugging, testing, and verification , 2002, IBM Syst. J..

[145]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[146]  Elaine J. Weyuker,et al.  On the Automation of Software Fault Prediction , 2006, Testing: Academic & Industrial Conference - Practice And Research Techniques (TAIC PART'06).

[147]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[148]  Enio G. Jelihovschi,et al.  ScottKnott: A Package for Performing the Scott-Knott Clustering Algorithm in R , 2014 .

[149]  Hausi A. Müller,et al.  Predicting fault-proneness using OO metrics. An industrial case study , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[150]  Zhi-Hua Zhou,et al.  Sample-based software defect prediction with active and semi-supervised learning , 2012, Automated Software Engineering.

[151]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[152]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[153]  Yue Jiang,et al.  Variance Analysis in Software Fault Prediction Models , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[154]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[155]  Frank Elberzhager,et al.  Transparent combination of expert and measurement data for defect prediction: an industrial case study , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[156]  Edgar M. Hoover,et al.  The Measurement of Industrial Localization , 1936 .

[157]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[158]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[159]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[160]  Arvinder Kaur,et al.  Effect of software evolution on software metrics: an open source case study , 2011, SOEN.

[161]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[162]  Han Lin Shang,et al.  Selection of the optimal Box–Cox transformation parameter for modelling and forecasting age-specific fertility , 2015, 1503.02344.

[163]  Bogdan Vasilescu,et al.  Analysis of Advanced Aggregation Techniques for Software Metrics , 2011 .

[164]  Rudolf Ramler,et al.  Building Defect Prediction Models in Practice , 2014 .

[165]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[166]  Banu Diri,et al.  Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm , 2011, Expert Syst. Appl..

[167]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[168]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[169]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[170]  Premkumar T. Devanbu,et al.  Ecological inference in empirical software engineering , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[171]  Frank A. Cowell,et al.  Generalized entropy and the measurement of distributional change , 1980 .

[172]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[173]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[174]  Alexander Serebrenik,et al.  Empirical Analysis of the Relationship between CC and SLOC in a Large Corpus of Java Methods , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[175]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[176]  Anthony J Bishara,et al.  Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality , 2015, Educational and psychological measurement.

[177]  Vandana Bhattacherjee,et al.  Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm , 2012, IEEE Transactions on Knowledge and Data Engineering.

[178]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[179]  Tu Minh Phuong,et al.  Topic-based defect prediction: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[180]  Yann-Gaël Guéhéneuc,et al.  Design evolution metrics for defect prediction in object oriented systems , 2010, Empirical Software Engineering.

[181]  Witold Pedrycz,et al.  An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite , 2004, Empirical Software Engineering.

[182]  Naoyasu Ubayashi,et al.  An empirical study of just-in-time defect prediction using cross-project models , 2014, MSR 2014.

[183]  Daniela Cruzes,et al.  What works for whom, where, when, and why? On the role of context in empirical software engineering , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[184]  Banu Diri,et al.  Metrics-Driven Software Quality Prediction Without Prior Fault Data , 2010 .

[185]  C. van Koten,et al.  An application of Bayesian network for predicting object-oriented software maintainability , 2006, Inf. Softw. Technol..

[186]  Brian Henderson-Sellers,et al.  Object-Oriented Metrics , 1995, TOOLS.

[187]  Ye Yang,et al.  Predicting Fault-Prone Modules: A Comparative Study , 2009, SEAFOOD.

[188]  Audris Mockus,et al.  How Does Context Affect the Distribution of Software Maintainability Metrics? , 2013, 2013 IEEE International Conference on Software Maintenance.

[189]  Hongyu Zhang,et al.  Discovering power laws in computer programs , 2009, Inf. Process. Manag..

[190]  Zhaowei Shang,et al.  Negative samples reduction in cross-company software defects prediction , 2015, Inf. Softw. Technol..

[191]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[192]  Elaine J. Weyuker,et al.  Assessing the Impact of Using Fault Prediction in Industry , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[193]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[194]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[195]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[196]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[197]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[198]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[199]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[200]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[201]  Audris Mockus,et al.  Amassing and indexing a large sample of version control systems: Towards the census of public source code history , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[202]  Claus Lewerentz,et al.  Applying design-metrics to object-oriented frameworks , 1996, Proceedings of the 3rd International Software Metrics Symposium.

[203]  Raed Shatnawi,et al.  A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source Systems , 2010, IEEE Transactions on Software Engineering.

[204]  Markus Neuhäuser,et al.  Effective use of Spearman's and Kendall's correlation coefficients for association between two measured traits , 2015, Animal Behaviour.

[205]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[206]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[207]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[208]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[209]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.