Connectionist Models for Learning, Discovering, and Forecasting Software Effort: An Empirical Study

Mining information and knowledge from very large databases is recognized as a key research area in the machine learning and expert systems field. We show connectionist models can be used to learn software effort when the available data set is small. We use a connectionist model to learn software effort from a small training data set of several software projects and validate the model on a holdout sample. Several design issues associated with developing connectionist models on a small training data set are described. Our research indicates that connectionist models, whenever carefully designed, could help knowledge discovery and software effort estimation.

[1]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[2]  Girish H. Subramanian,et al.  An Examination of Some Software Development Effort and Productivity Determinants in ICASE Tool Projects , 1996, J. Manag. Inf. Syst..

[3]  Vijay S. Mookerjee,et al.  Inductive Expert System Design: Maximizing System Value , 1993, Inf. Syst. Res..

[4]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[5]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[6]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[7]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[8]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[9]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[10]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[15]  Rajiv D. Banker,et al.  A Field Study of Scale Economies in Software Maintenance , 1997 .

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Cao Feng,et al.  A Comparative Study of Classification Algorithms: Statistical, Machine Learning and Neural Network , 1992, Machine Intelligence 13.

[18]  James V. Hansen,et al.  Artificial Intelligence and Generalized Qualitative‐Response Models: An Empirical Test on Two Audit Decision‐Making Domains , 1992 .

[19]  Bharat A. Jain,et al.  Artificial Neural Network Models for Pricing Initial Public Offerings , 1995 .

[20]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[21]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[22]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[25]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  Selwyn Piramuthu,et al.  Using Feature Construction to Improve the Performance of Neural Networks , 1998 .

[28]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[29]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[30]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[31]  B. Curry,et al.  Neural networks: a need for caution , 1997 .

[32]  Robert J. Marks,et al.  Performance Comparisons Between Backpropagation Networks and Classification Trees on Three Real-World Applications , 1989, NIPS.

[33]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[34]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[35]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[36]  Michael Y. Hu,et al.  Two-Group Classification Using Neural Networks* , 1993 .

[37]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[38]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[39]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[40]  Stephen Flowers,et al.  Software failure, management failure : amazing stories and cautionary tales , 1996 .