Auto-Associative Neural Networks to Improve the Accuracy of Estimation Models

Prediction of software engineering variables with high accuracy is still an open problem. The primary reason for the lack of high accuracy in prediction might be because most models are linear in the parameters and so are not sufficiently flexible and suffer from redundancy. In this chapter, we focus on improving regression models by decreasing their redundancy and increasing their parsimony, i.e., we turn the model into a model with fewer variables than the former. We present an empirical auto-associative neural network-based strategy for model improvement, which implements a reduction technique called Curvilinear component analysis. The contribution of this chapter is to show how multi-layer feedforward neural networks can be a useful and practical mechanism for improving software engineering estimation models.

[1]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[2]  Ronald Gulezian Reformulating and calibrating COCOMO , 1991, J. Syst. Softw..

[3]  Gérard Dreyfus,et al.  Neural networks - methodology and applications , 2005 .

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[6]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[7]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[8]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[9]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[12]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[13]  Donald E. Neumann An Enhanced Neural Network Technique for Software Risk Analysis , 2002, IEEE Trans. Software Eng..

[14]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[15]  Victor R. Basili,et al.  A Methodology for Collecting Valid Software Engineering Data , 1984, IEEE Transactions on Software Engineering.

[16]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[17]  Abhijit S. Pandya,et al.  A comparative study of pattern recognition techniques for quality evaluation of telecommunications software , 1994, IEEE J. Sel. Areas Commun..

[18]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[19]  Shouhong Wang,et al.  A Model for Monitoring and Enforcing Online Auction Ethics , 2005, Int. J. Intell. Inf. Technol..

[20]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[21]  Eugene Miya,et al.  On "Software engineering" , 1985, SOEN.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Victor R. Basili,et al.  Adopting Curvilinear Component Analysis to Improve Software Cost Estimation Accuracy Model, Application Strategy, and an Experimental Verification , 2008, EASE.

[24]  Martin J. Shepperd,et al.  Software project economics: a roadmap , 2007, Future of Software Engineering (FOSE '07).

[25]  Norman E. Fenton,et al.  Software Metrics: A Rigorous Approach , 1991 .

[26]  Barbara A. Kitchenham,et al.  An empirical validation of the relationship between the magnitude of relative error and project size , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[27]  Steve McConnell Software Estimation: Demystifying the Black Art , 2006 .

[28]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[29]  Taghi M. Khoshgoftaar,et al.  A neural network approach for early detection of program modules having high risk in the maintenance phase , 1995, J. Syst. Softw..

[30]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[31]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[32]  D. Ross Jeffery,et al.  Calibrating estimation tools for software development , 1990, Softw. Eng. J..

[33]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[34]  Tim Menzies,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[35]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[36]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[37]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[38]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[39]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[40]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[41]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[42]  Ingunn Myrtveit,et al.  Do arbitrary function approximators make sense as software prediction models? , 2004, 12 International Workshop on Software Technology and Engineering Practice (STEP'04).

[43]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[44]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[45]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[46]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[47]  Christopher M. Bishop,et al.  Bayesian Inference of Noise Levels in Regression , 1996, ICANN.

[48]  J. Elashoff,et al.  Multiple Regression in Behavioral Research. , 1975 .

[49]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[50]  Martin Shepperd,et al.  Case and Feature Subset Selection in Case-Based Software Project Effort Prediction , 2003 .

[51]  I. Jolliffe Principal Component Analysis , 2002 .

[52]  Tim Menzies,et al.  Validation methods for calibrating software effort models , 2005, ICSE.