Software Effort Prediction Using Regression Rule Extraction from Neural Networks

Neural networks are often selected as tool for software effort prediction because of their capability to approximate any continuous function with arbitrary accuracy. A major drawback of neural networks is the complex mapping between inputs and output, which is not easily understood by a user. This paper describes a rule extraction technique that derives a set of comprehensible IF-THEN rules from a trained neural network applied to the domain of software effort prediction. The suitability of this technique is tested on the ISBSG R11 data set by a comparison with linear regression, radial basis function networks, and CART. It is found that the most accurate results are obtained by CART, though the large number of rules limits comprehensibility. Considering comprehensible models only, the concise set of extracted rules outperform the pruned CART tree, making neural network rule extraction the most suitable technique for software effort prediction when comprehensibility is important.

[1]  Jürgen Bode,et al.  Neural networks for cost estimation: Simulations and pilot application , 2000 .

[2]  Victor R. Basili,et al.  A meta-model for software development resource expenditures , 1981, ICSE '81.

[3]  Taghi M. Khoshgoftaar,et al.  Mining Data from Multiple Software Development Projects , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[4]  Bernard Widrow,et al.  Neural networks: applications in industry, business and science , 1994, CACM.

[5]  Hareton K. N. Leung,et al.  Estimating Maintenance Effort by Analogy , 2002, Empirical Software Engineering.

[6]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[7]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[8]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[9]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[10]  Bart BaesensRudy Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation , 2003 .

[11]  G. Kateman,et al.  Colored information from a black box?: Validation and evaluation of neural networks , 1993 .

[12]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[13]  D. Treigueiros,et al.  The application of neural network based methods to the extraction of knowledge from accounting reports , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Rudy Setiono,et al.  Generating Linear Regression Rules from Neural Networks Using Local Least Squares Approximation , 2001, IWANN.

[16]  Wee Kheng Leow,et al.  Pruned Neural Networks for Regression , 2000, PRICAI.

[17]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[18]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[19]  Emilia Mendes,et al.  Web Metrics-Estimating Design and Authoring Effort , 2001, IEEE Multim..

[20]  Alain Abran,et al.  Function Points Analysis: An Empirical Study of Its Measurement Processes , 1996, IEEE Trans. Software Eng..

[21]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[22]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[23]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[24]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.

[27]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[28]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[29]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[30]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[31]  M J Pazzani,et al.  Acceptance of Rules Generated by Machine Learning among Medical Experts , 2001, Methods of Information in Medicine.

[32]  Arun Kumar Misra,et al.  Estimating software maintenance effort: a neural network approach , 2008, ISEC '08.

[33]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[34]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[35]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[36]  Jacek M. Zurada,et al.  Extraction of rules from artificial neural networks for nonlinear regression , 2002, IEEE Trans. Neural Networks.

[37]  Nandlal L. Sarda,et al.  Effort drivers in maintenance outsourcing-an experiment using Taguchi's methodology , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[38]  Luís Torgo,et al.  Search-Based Class Discretization , 1997, ECML.

[39]  Taghi M. Khoshgoftaar,et al.  Can neural networks be easily interpreted in software cost estimation? , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[40]  Ali Idri,et al.  Software Cost Estimation Models Using Radial Basis Function Neural Networks , 2007, IWSM/Mensura.

[41]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[42]  Daniel Port,et al.  Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research , 2008, ESEM '08.

[43]  W. R. Shankle,et al.  Acceptance by medical experts of rules generated by machine learning , 2001 .

[44]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[45]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[46]  Emilia Mendes,et al.  Measurement and Effort Prediction for Web Applications , 2001, Web Engineering.

[47]  E A Nelson,et al.  MANAGEMENT HANDBOOK FOR THE ESTIMATION OF COMPUTER PROGRAMMING COSTS , 1967 .

[48]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[49]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[50]  José Javier Dolado,et al.  A Study of the Relationships among Albrecht and Mark II Function Points, Lines of Code 4GL and Effort , 1997, J. Syst. Softw..

[51]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[52]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[53]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[54]  Claude E. Walston,et al.  A Method of Programming Measurement and Estimation , 1977, IBM Syst. J..

[55]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .