A Study on Software Effort Prediction Using Machine Learning Techniques

This paper conducts a study on of software effort prediction using machine learning techniques. Both supervised and unsupervised learning techniques are employed to predict software effort using historical dataset. The unsupervised learning as k-medoids clustering equipped with different similarity measures is used to cluster projects in historical dataset. The supervised learning as J48 decision tree, back propagation neural network (BPNN) and na\(\ddot{i}\)ve Bayes is used to classify the software projects into different effort classes. We also impute the missing values in the historical datasets and then machine learning techniques are adopted to predict software effort. Experiments on ISBSG and CSBSG datasets demonstrate that unsupervised learning as k-medoids clustering produced a poor performance. Kulzinsky coefficient has the best performance in measuring the similarities of projects. Supervised learning techniques produced superior performances than unsupervised learning techniques in software effort prediction. BPNN produced the best performance among the three supervised learning techniques. Missing data imputation improved the performances of both unsupervised and supervised learning techniques in software effort prediction.

[1]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[2]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[3]  Qinbao Song,et al.  A new imputation method for small software project data sets , 2007, J. Syst. Softw..

[4]  Michael J. Prietula,et al.  Software-effort estimation with a case-based reasoner , 1996, J. Exp. Theor. Artif. Intell..

[5]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[6]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[7]  Vahid Garousi,et al.  Trustworthy Software Development Processes, International Conference on Software Process, ICSP 2009 Vancouver, Canada, May 16-17, 2009 Proceedings , 2009, ICSP.

[8]  Richard E. Fairley Recent advances in software estimation techniques , 1992, International Conference on Software Engineering.

[9]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[10]  Dietmar Pfahl,et al.  Making Globally Distributed Software Development a Success Story, International Conference on Software Process, ICSP 2008, Leipzig, Germany, May 10-11, 2008, Proceedings , 2008, ICSP.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Ye Yang,et al.  Process Trustworthiness as a Capability Indicator for Measuring and Improving Software Trustworthiness , 2009, ICSP.

[13]  Ye Yang,et al.  Handling missing data in software effort prediction with naive Bayes and EM algorithm , 2011, Promise '11.

[14]  Naftali Tishby,et al.  Generalization from Observed to Unobserved Features by Clustering , 2008, J. Mach. Learn. Res..

[15]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[16]  Kaushal K. Shukla,et al.  Neuro-genetic prediction of software development effort , 2000, Inf. Softw. Technol..

[17]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[18]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[19]  Kai Ye,et al.  An Investigation of Software Development Productivity in China , 2008, ICSP.

[20]  Marcel Korte,et al.  Confidence in software cost estimation results based on MMRE and PRED , 2008, PROMISE '08.

[21]  John Benjafield,et al.  Cognition, 3rd ed. , 2007 .

[22]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[23]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[24]  Heejun Park,et al.  An empirical validation of a neural network model for software effort estimation , 2008, Expert Syst. Appl..

[25]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[26]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .