Investigating the Significance of Bellwether Effect to Improve Software Effort Estimation

Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation accuracy using moving windows. The existence of the Bellwether was empirically proven based on six postulations. We apply statistical stratification and Markov chain methodology to select the Bellwether moving window. The resulting Bellwether moving window is used to predict the software effort of a new project. Empirical results show that Bellwether effect exist in chronological datasets with a set of exemplary and recently completed projects representing the Bellwether moving window. Result from this study has shown that the use of Bellwether moving window with the Gaussian weighting function significantly improve the prediction accuracy.

[1]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[2]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[3]  Emilia Mendes,et al.  Why comparative effort prediction studies may be invalid , 2009, PROMISE '09.

[4]  Peter A. Whigham,et al.  A Baseline Model for Software Effort Estimation , 2015, TSEM.

[5]  Mark Harman,et al.  Multi-objective Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Jinwu Gao,et al.  Law of Large Numbers for Uncertain Random Variables , 2016, IEEE Transactions on Fuzzy Systems.

[7]  Robert P. Dobrow,et al.  Introduction to Stochastic Processes With R: Dobrow/Introduction to Stochastic Processes With R , 2016 .

[8]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[9]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[10]  Raghu Ramakrishnan,et al.  Bellwether analysis: predicting global aggregates from local regions , 2006, VLDB.

[11]  Raghu Ramakrishnan,et al.  Bellwether analysis: Searching for cost-effective query-defined predictors in large databases , 2009, TKDD.

[12]  Sousuke Amasaki,et al.  On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation , 2015, J. Softw. Evol. Process..

[13]  Emilia Mendes,et al.  Investigating the use of moving windows to improve software effort prediction: a replicated study , 2017, Empirical Software Engineering.

[14]  Emilia Mendes,et al.  Applying moving windows to software effort estimation , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[15]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[16]  Emilia Mendes,et al.  Investigating the Use of Duration-Based Moving Windows to Improve Software Effort Prediction , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[17]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[18]  Tim Menzies,et al.  Too much automation? The bellwether effect and its implications for transfer learning , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Thomas W. MacFarland,et al.  Kruskal–Wallis H-Test for Oneway Analysis of Variance (ANOVA) by Ranks , 2016 .

[20]  T. Wright,et al.  Organizational Benchmarking Using the ISBSG Data Repository , 2001, IEEE Softw..

[21]  Carolyn Mair,et al.  An analysis of data sets used to train and validate cost prediction systems , 2005, ACM SIGSOFT Softw. Eng. Notes.

[22]  Emilia Mendes,et al.  Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study , 2014, Inf. Softw. Technol..

[23]  Rajan Chattamvelli,et al.  Skewness and Kurtosis , 2016 .

[24]  Min Xie,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015, Inf. Softw. Technol..

[25]  Sousuke Amasaki,et al.  The Evaluation of Weighted Moving Windows for Software Effort Estimation , 2013, PROFES.

[26]  Sousuke Amasaki,et al.  A replication study on the effects of weighted moving windows for software effort estimation , 2016, EASE.

[27]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[28]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[29]  ShepperdMartin,et al.  An analysis of data sets used to train and validate cost prediction systems , 2005 .

[30]  J. Lachin Introduction to sample size determination and power analysis for clinical trials. , 1981, Controlled clinical trials.

[31]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[32]  Nian Zhang,et al.  An effective LS-SVM-based approach for surface roughness prediction in machined surfaces , 2016, Neurocomputing.

[33]  Doo-Hwan Bae,et al.  On the value of outlier elimination on software effort estimation research , 2012, Empirical Software Engineering.

[34]  Tadashi Dohi,et al.  Enhancing Performance of Random Testing through Markov Chain Monte Carlo Methods , 2010, IEEE Transactions on Computers.

[35]  H. Fischer A History of the Central Limit Theorem: From Classical to Modern Probability Theory , 2010 .

[36]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[37]  Jacky W. Keung,et al.  Multi-Objective Optimization for Software Testing Effort Estimation , 2016, SEKE.

[38]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.