Missing Data Imputation Techniques for Software Effort Estimation: A Study of Recent Issues and Challenges

Software effort estimation is one the critical aspects of software engineering. It revolves around predicting the required efforts needed to complete a software task. However, any estimation technique or model relies on an input data in which it defines and predicts future values. Missing data and values within such data is a common occurrence in the software development industry and thus it leads to inaccurate predictions or misleading results. Thus, Missing Data is an important aspect of effort estimation models that is required to be addressed. However, Missing Data is not without its gaps and issues. This review aims at elaborating the recent issues and gaps that exist within the missing data and software effort estimation field. This may allow future researchers to get a better grasp and understanding of the inner workings of Missing Data and the methods through which these challenges can be addressed.

[1]  Min Xie,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015, Inf. Softw. Technol..

[2]  Sasan H. Alizadeh,et al.  Using parametric regression and KNN algorithm with missing handling for software effort prediction , 2016, 2016 Artificial Intelligence and Robotics (IRANOPEN).

[3]  Panagiota Chatzipetrou Software Cost Estimation: A State-of-the-Art Statistical and Visualization Approach for Missing Data , 2019, Int. J. Serv. Sci. Manag. Eng. Technol..

[4]  Bhekisipho Twala,et al.  Ensemble missing data techniques for software effort prediction , 2010, Intell. Data Anal..

[5]  Alain Abran,et al.  Use of the multiple imputation strategy to deal with missing data in the ISBSG repository , 2016 .

[6]  Alain Abran,et al.  Missing data techniques in analogy-based software development effort estimation , 2016, J. Syst. Softw..

[7]  Ioannis Stamelos,et al.  An investigation of effort distribution among development phases: A four‐stage progressive software cost estimation model , 2017, J. Softw. Evol. Process..

[8]  Ali Idri,et al.  Improved Analogy-Based Effort Estimation with Incomplete Mixed Data , 2018, 2018 Federated Conference on Computer Science and Information Systems (FedCSIS).

[9]  Adam Trendowicz,et al.  Constructive Cost Model—COCOMO , 2014 .

[10]  Mohammad Azzeh,et al.  Comparative analysis of soft computing techniques for predicting software effort based use case points , 2017, IET Softw..

[11]  Suresh Chandra Satapathy,et al.  Software reusability metrics estimation: Algorithms, models and optimization techniques , 2017, Comput. Electr. Eng..

[12]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[13]  Chih-Fong Tsai,et al.  Missing value imputation: a review and analysis of the literature (2006–2017) , 2019, Artificial Intelligence Review.

[14]  Ali Idri,et al.  Evaluating Fuzzy Analogy on incomplete software projects data , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[15]  Min Xie,et al.  An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[16]  Baowen Xu,et al.  Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[17]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[18]  Audris Mockus,et al.  Missing Data in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[19]  Xin Yao,et al.  Software Effort Interval Prediction via Bayesian Inference and Synthetic Bootstrap Resampling , 2019, ACM Trans. Softw. Eng. Methodol..

[20]  Jacky W. Keung,et al.  Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study , 2017, J. Syst. Softw..

[21]  Alain Abran,et al.  Support vector regression‐based imputation in analogy‐based software development effort estimation , 2018, J. Softw. Evol. Process..

[22]  Farhad Soleimanian Gharehchopogh,et al.  A Hybrid of Ant Colony Optimization and Chaos Optimization Algorithms Approach for Software Cost Estimation , 2015 .

[23]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[24]  Janice Singer,et al.  Studying Software Engineers: Data Collection Techniques for Software Field Studies , 2005, Empirical Software Engineering.

[25]  Ye Yang,et al.  Using Bayesian regression and EM algorithm with missing handling for software effort prediction , 2015, Inf. Softw. Technol..

[26]  Jacky W. Keung,et al.  An Empirical Analysis of Three-Stage Data-Preprocessing for Analogy-Based Software Effort Estimation on the ISBSG Data , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[27]  Mark Harman,et al.  Multi-objective Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[28]  Hongyi Sun,et al.  Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).