Model Combination in the Multiple-Data-Batches Scenario

The approach of combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e., the data combination approach). This paper empirically examines the base-line behaviour of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used. The practical implication of our results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available. Another interesting result is that we empirically show that the near-asymptotic performance of a single model, in some classification task, can be significantly improved by combining multiple models (derived from the same algorithm) if the constituent models are substantially different and there is some regularity in the models to be exploited by the combination method used. Comparisons with known theoretical results are also provided.

[1]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[2]  Kai Ming Ting,et al.  Theory combination: an alternative to data combination , 1996 .

[3]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[4]  Larry A. Rendell,et al.  Building Robust Learning Systems by Combining Induction and Optimization , 1989, IJCAI.

[5]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[6]  Kai Ming Ting,et al.  Discretization of Continuous-Valued Attributes and Instance-Based Learning , 1994 .

[7]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[8]  Dietrich Wettschereck,et al.  A Hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm , 1994, ECML.

[9]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.

[10]  Jude W. Shavlik,et al.  Learning to Represent Codons: A Challenge Problem for Constructive Induction , 1993, IJCAI.

[11]  Wray L. Buntine Classifiers: A Theoretical and Empirical Study , 1991, IJCAI.

[12]  Kai Ming Ting The Characterisation of Predictive Accuracy and Decision Combination , 1996, ICML.

[13]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[14]  Luís Torgo,et al.  Knowledge Acquisition via Knowledge Integration , 1990 .

[15]  L. Breiman Pasting Bites Together For Prediction In Large Data Sets And On-Line , 1996 .

[16]  Igor Kononenko,et al.  Learning as Optimization: Stochastic Generation of Multiple Knowledge , 1992, ML.

[17]  SelectionCarla E. BrodleyDepartment Addressing the Selective Superiority Problem : Automatic Algorithm / Model Class , 1993 .

[18]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[19]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[20]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[21]  CombinationKai,et al.  Theory Combination : an alternative toData , 1996 .

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[24]  William G. Baxt,et al.  Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks , 1992, Neural Computation.

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  H. Sebastian Seung,et al.  Learning from a Population of Hypotheses , 1993, COLT '93.

[28]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[29]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[30]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[31]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[32]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[33]  Carla E. Brodley,et al.  Automatic Algorith/Model Class Selection , 1993, International Conference on Machine Learning.

[34]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[35]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.