Using parametric regression and KNN algorithm with missing handling for software effort prediction

Estimating the software development costs, budget and resources such as the time and effort is one of the most important activities in the software project management. The error rate, at the estimating costs, has a sizable portion in success or fail of a project. In general, it is used from similar project histories for project estimation. One of the challenges in this approach is missing values. in this research, first, for handling missing values the K nearest neighbor (KNN) algorithm and Mean Imputation has been used, then for effort prediction, the parametric model based methods, the nonlinear and polynomial regression(quadratic) is used. The proposed method is performed on the CM1 dataset and the results show that the combination of KNN and nonlinear regression (quadratic) has the best response, signifying accuracy improvement and relative error reduction, in comparing with other approaches.

[1]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[2]  K. Spang,et al.  Investments in project management are profitable: A case study-based analysis of the relationship between the costs and benefits of project management , 2014 .

[3]  Farhad Soleimanian Gharehchopogh Neural networks application in software cost estimation: A case study , 2011, 2011 International Symposium on Innovations in Intelligent Systems and Applications.

[4]  Qinbao Song,et al.  A new imputation method for small software project data sets , 2007, J. Syst. Softw..

[5]  Kristin L. Sainani,et al.  Dealing with missing data , 2002 .

[6]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[7]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[8]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[9]  Taghi M. Khoshgoftaar,et al.  Imputation techniques for multivariate missingness in software measurement data , 2008, Software Quality Journal.

[10]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[11]  Ms.R Malarvizhi K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imputation , 2012 .

[12]  A. Damasio,et al.  Individuals with sociopathic behavior caused by frontal damage fail to respond autonomically to social stimuli , 1990, Behavioural Brain Research.

[13]  Shelley A. Blozis,et al.  Advances in Missing Data Models and Fidelity Issues of Implementing These Methods in Prevention Science , 2014 .

[14]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[15]  Farhad Soleimanian Gharehchopogh,et al.  Comparison and evaluation of data mining techniques with algorithmic models in software cost estimation , 2012 .

[16]  Hisao Ishibuchi,et al.  Application of Fuzzy Inference Rules to Early Semi-automatic Estimation of Activity Duration in Software Project Management , 2014, IEEE Transactions on Human-Machine Systems.

[17]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[18]  Bruce McMillin,et al.  Software engineering: What is it? , 2018, 2018 IEEE Aerospace Conference.

[19]  J. Ball,et al.  Statistics review 6: Nonparametric methods , 2002, Critical care.

[20]  Magne Jørgensen Top-down and bottom-up expert estimation of software development effort , 2004, Inf. Softw. Technol..

[21]  Ye Yang,et al.  Using Bayesian regression and EM algorithm with missing handling for software effort prediction , 2015, Inf. Softw. Technol..

[22]  Christof Ebert,et al.  Functional Size Estimation Technologies for Software Maintenance , 2014, IEEE Software.

[23]  Francisco Herrera,et al.  Dealing with Missing Values , 2015 .

[24]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[25]  Bhekisipho Twala,et al.  Ensemble of missing data techniques to improve software prediction accuracy , 2006, ICSE.

[26]  Emilia Mendes Practitioner's Knowledge Representation , 2014, Springer Berlin Heidelberg.

[27]  Zarinah Mohd Kasirun,et al.  E-cost estimation using expert judgment and COCOMO II , 2010, 2010 International Symposium on Information Technology.

[28]  Magne Jørgensen,et al.  Failure factors of small software projects at a global outsourcing marketplace , 2014, J. Syst. Softw..

[29]  Brady T. West,et al.  On Enhancing Plausibility of the Missing at Random Assumption in Incomplete Data Analyses via Evaluation of Response-Auxiliary Variable Correlations , 2016 .

[30]  Adam Trendowicz,et al.  Software Project Effort Estimation , 2014, Springer International Publishing.

[31]  Ekrem Kocaguneli,et al.  A Ranking Stability Indicator for Selecting the Best Effort Estimator in Software Cost Estimation , 2011 .

[32]  Rajib Mall,et al.  Software Effort Estimation Using Data Mining Techniques , 2014 .

[33]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[34]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[35]  Selangor Darul Ehsan,et al.  Issues in Software Cost Estimation , 2008 .