Using Correspondence Analysis to Combine Classifiers

Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by applying different learning algorithms to the same data. The predictions of the models are then combined according to a voting scheme. This paper focuses on the task of combining the predictions of a set of learned models. The method described uses the strategies of stacking and Correspondence Analysis to model the relationship between the learning examples and their classification by a collection of learned models. A nearest neighbor method is then applied within the resulting representation to classify previously unseen examples. The new algorithm does not perform worse than, and frequently performs significantly better than other combining techniques on a suite of data sets.

[1]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[2]  Michael J. Pazzani,et al.  Classification Using Bayes Averaging of Multiple, Relational Rule-based Models , 1995, AISTATS.

[3]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[4]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[6]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[9]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[12]  Michael J. Pazzani,et al.  Classification and regression by combining models , 1998 .

[13]  Michael Perrone,et al.  Putting It All Together: Methods for Combining Neural Networks , 1993, NIPS.

[14]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[17]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[20]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[21]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[23]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[24]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[26]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[27]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[30]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[31]  Ron Meir,et al.  Bias, Variance and the Combination of Least Squares Estimators , 1994, NIPS.

[32]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[33]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[34]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[35]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[36]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .