论文信息 - Creating diversity in ensembles using artificial data

Creating diversity in ensembles using artificial data

Abstract The diversity of an ensemble of classifiers is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, D ecorate (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. D ecorate also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets.

Raymond J. Mooney | Prem Melville | R. Mooney | Prem Melville

[1] Anders Krogh,et al. Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[2] Michael Collins,et al. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[3] David W. Opitz,et al. Feature Selection for Ensembles , 1999, AAAI/IAAI.

[4] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[5] Michael A. Arbib,et al. The handbook of brain theory and neural networks , 1995, A Bradford book.

[6] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[7] Nikunj C. Oza,et al. Online Ensemble Learning , 2000, AAAI/IAAI.

[8] Robert E. Schapire,et al. Theoretical Views of Boosting and Applications , 1999, ALT.

[9] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[10] Bruce E. Rosen,et al. Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[11] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12] Nathan Intrator,et al. Bootstrapping with Noise: An Effective Regularization Technique , 1996, Connect. Sci..

[13] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.

[14] D. Opitz,et al. Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[15] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[16] Padraig Cunningham,et al. Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[17] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18] David W. Opitz,et al. An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[19] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[20] Gavin Brown,et al. The Use of the Ambiguity Decomposition in Neural Network Ensemble Learning Methods , 2003, ICML.

[21] Michael Collins,et al. Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[22] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[23] Kagan Tumer,et al. Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[24] David W. Opitz,et al. Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[25] Arun D Kulkarni,et al. Neural Networks for Pattern Recognition , 1991 .

[26] Raymond J. Mooney,et al. Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[27] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[28] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[29] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[30] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[31] Raymond J. Mooney,et al. Experiments on Ensembles with Missing and Noisy Data , 2004, Multiple Classifier Systems.

[32] Eric Bauer,et al. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[33] Nikunj C. Oza,et al. Decimated input ensembles for improved generalization , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[34] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[35] Raymond J. Mooney,et al. Diverse ensembles for active learning , 2004, ICML.

[36] Geoffrey I. Webb,et al. MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[37] Pedro M. Domingos. Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[38] Padraig Cunningham,et al. Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[39] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[40] Xin Yao,et al. Ensemble learning via negative correlation , 1999, Neural Networks.

[41] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[42] Jude W. Shavlik,et al. in Advances in Neural Information Processing , 1996 .