Complexity: Using Assemblies of Multiple Models

In this part of the book Data Science for Software Engineering: Sharing Data and Models , explores ensemble learners and multi-objective optimizers as applied to software engineering. Novel incremental ensemble learners are explained along with one of the largest ensemble learning (in effort estimation) experiments yet attempted. It turns out that the specific goals of the learning has an effect on what is learned and, for this reason, this part also explores multi-goal reasoning. We show that multi-goal optimizers can significantly improve effort estimation results.

[1]  Ekrem Kocaguneli,et al.  Combining Multiple Learners Induced on Multiple Datasets for Software Effort Prediction , 2009 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[4]  Yue Jiang,et al.  Cost Curve Evaluation of Fault Prediction Models , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[5]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[6]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  Karen T. Lum,et al.  Stable rankings for different effort models , 2010, Automated Software Engineering.

[9]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  C. Kaynak,et al.  Techniques for Combining Multiple Learners , 1998 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[14]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[15]  Daniel Ryan Baker,et al.  A Hybrid Approach to Expert and Model Based Effort Estimation , 2007 .

[16]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Ayse Basar Bener,et al.  ENNA: software effort estimation using ensemble of neural networks with associative memory , 2008, SIGSOFT '08/FSE-16.

[19]  Sudha Ram,et al.  Constrained cascade generalization of decision trees , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Taghi M. Khoshgoftaar,et al.  Software quality analysis by combining multiple projects and learners , 2008, Software Quality Journal.

[21]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[22]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[23]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[24]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[25]  Lefteris Angelis,et al.  Using Ensembles for Web Effort Estimation , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[26]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[27]  Horst Bunke,et al.  Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition , 2004 .

[28]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[29]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[30]  John Noll,et al.  Can Automated Text Classification Improve Content Analysis of Software Project Data? , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.