Software mining and fault prediction

Mining software repositories (MSRs) such as source control repositories, bug repositories, deployment logs, and code repositories provide useful patterns for practitioners. Instead of using these repositories as record‐keeping ones, we need to transform them into active repositories that can guide the decision processes inside the company. By MSRs with several data mining algorithms, effective software fault prediction models can be built and error‐prone modules can be detected prior to the testing phase. We discuss numerous real‐world challenges in building accurate fault prediction models and present some solutions to these challenges. © 2012 Wiley Periodicals, Inc.

[1]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[2]  Qian Yin,et al.  Software quality prediction using Affinity Propagation algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[3]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[4]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[5]  Raed Shatnawi,et al.  Finding software metrics threshold values using ROC curves , 2010, J. Softw. Maintenance Res. Pract..

[6]  Taghi M. Khoshgoftaar,et al.  Detecting Outliers Using Rule-Based Modeling for Improving CBR-Based Software Quality Classification Models , 2003, ICCBR.

[7]  Taghi M. Khoshgoftaar,et al.  Empirical Case Studies in Attribute Noise Detection , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[9]  Taghi M. Khoshgoftaar,et al.  A Hybrid Approach to Cleansing Software Measurement Data , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[10]  Taghi M. Khoshgoftaar,et al.  Semi-supervised learning for software quality estimation , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[11]  Taghi M. Khoshgoftaar,et al.  Enhancing software quality estimation using ensemble-classifier based noise filtering , 2005, Intell. Data Anal..

[12]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[13]  Oral Alan,et al.  Class noise detection based on software metrics and ROC curves , 2011, Inf. Sci..

[14]  Doo-Hwan Bae,et al.  An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[15]  Chao Liu,et al.  Data Mining for Software Engineering , 2009, Computer.

[16]  Doo-Hwan Bae,et al.  An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method , 2007, ESEM 2007.

[17]  Taghi M. Khoshgoftaar,et al.  Rule-based noise detection for software measurement data , 2004, Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004..

[18]  Tao Xie,et al.  Software intelligence: the future of mining software engineering data , 2010, FoSER '10.

[19]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[20]  Banu Diri,et al.  Software Fault Prediction of Unlabeled Program Modules , 2009 .

[21]  Wei Li,et al.  Finding software metrics threshold values using ROC curves , 2010 .

[22]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[23]  Taghi M. Khoshgoftaar,et al.  The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[24]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[25]  Banu Diri,et al.  Unlabelled extra data do not always mean extra performance for semi‐supervised fault prediction , 2009, Expert Syst. J. Knowl. Eng..

[26]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[27]  Taghi M. Khoshgoftaar,et al.  Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[28]  Taghi M. Khoshgoftaar,et al.  Empirical Case Studies in Attribute Noise Detection , 2009, IEEE Trans. Syst. Man Cybern. Part C.