论文信息 - Creating ensemble of diverse maximum entropy models

Creating ensemble of diverse maximum entropy models

Diversity of a classifier ensemble has been shown to benefit overall classification performance. But most conventional methods of training ensembles offer no control on the extent of diversity and are meta-learners. We present a method for creating an ensemble of diverse maximum entropy (∂MaxEnt) models, which are popular in speech and language processing. We modify the objective function for conventional training of a MaxEnt model such that its output posterior distribution is diverse with respect to a reference model. Two diversity scores are explored - KL divergence and posterior cross-correlation. Experiments on the CoNLL-2003 Named Entity Recognition task and the IEMOCAP emotion recognition database show the benefits of a ∂MaxEnt ensemble.

[1] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[3] Kagan Tumer,et al. Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[4] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[5] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[6] Brian Kingsbury,et al. The IBM 2008 GALE Arabic speech transcription system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Ludmila I. Kuncheva,et al. Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[8] Raymond J. Mooney,et al. Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[9] Subhash C. Bagui,et al. Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[10] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[11] Tara N. Sainath,et al. Application specific loss minimization using gradient boosting , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[13] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[14] Naonori Ueda,et al. Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[15] Xin Yao,et al. Ensemble learning via negative correlation , 1999, Neural Networks.

[16] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17] James Bennett,et al. The Netflix Prize , 2007 .

[18] Jianying Hu,et al. Winning the KDD Cup Orange Challenge with Ensemble Selection , 2009, KDD Cup.

[19] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.