Ensemble modelling or selecting the best model: Many could be better than one

In the course of data modelling, many models could be created. Much work has been done on formulating guidelines for model selection. However, by and large, these guidelines are conservative or too specific. Instead of using general guidelines, models could be selected for a particular task based on statistical tests. When selecting one model, others are discarded. Instead of losing potential sources of information, models could be combined to yield better performance. We review the basics of model selection and combination and discuss their differences. Two examples of opportunistic and principled combinations are presented. The first demonstrates that mediocre quality models could be combined to yield significantly better performance. The latter is the main contribution of the paper; it describes and illustrates a novel heuristic approach called the SG(k-NN) ensemble for the generation of good-quality and diverse models that can even improve excellent quality models.

[1]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[2]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[3]  Yoram Reich,et al.  Machine Learning Techniques for Civil Engineering Problems , 1997 .

[4]  J. Congleton,et al.  Stress Corrosion Cracking of Sensitized Type 304 Stainless Steel in Doped High-Temperature Water , 1995 .

[5]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[6]  Yoram Reich,et al.  Learning in design: From characterizing dimensions to working systems , 1998, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[7]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[8]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[9]  Peter D. Turney The gap between abstract and concrete results in machine learning , 1991, J. Exp. Theor. Artif. Intell..

[10]  Ian T. Nabney,et al.  Practical Assessment of Neural Network Applications , 1997, SAFECOMP.

[11]  Yoram Reich Macro and micro perspectives of multistrategy learning , 1991 .

[12]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[13]  C. Lee Giles,et al.  What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .

[14]  William C. Carpenter,et al.  Guidelines for the selection of network architecture , 1997, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[15]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[16]  E N Hubble,et al.  A NEW USABLE PROPELLER SERIES , 1989 .

[17]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[18]  Kevin Swingler,et al.  Applying neural networks - a practical guide , 1996 .

[19]  Vra Krkov Kolmogorov's Theorem Is Relevant , 1991, Neural Computation.

[20]  Nathan Intrator,et al.  Optimal ensemble averaging of neural networks , 1997 .

[21]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[22]  Yoram Reich,et al.  Evaluating machine learning models for engineering problems , 1999, Artif. Intell. Eng..

[23]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .