论文信息 - Learning a Product of Experts with Elitist Lasso

Learning a Product of Experts with Elitist Lasso

Discriminative models such as logistic regression profit from the ability to incorporate arbitrary rich features; however, complex dependencies among overlapping features can often result in weight undertraining. One popular method that attempts to mitigate this problem is logarithmic opinion pools (LOP), which is a specialized form of product of experts model that automatically adjusts the weighting among experts. A major problem with LOP is that it requires significant amounts of domain expertise in designing effective experts. We propose a novel method that learns to induce experts — not just the weighting between them — through the use of a mixed ‘2‘1 norm as previously seen in elitist lasso. Unlike its more popular sibling ‘1‘2 norm (used in group lasso), which seeks feature sparsity at the group-level, ‘2‘1 norm encourages sparsity within feature groups. We demonstrate how this property can be leveraged as a competition mechanism to induce groups of diverse experts, and introduce a new formulation of elitist lasso MaxEnt in the FOBOS optimization framework (Duchi and Singer, 2009). Results on Named Entity Recognition task suggest that this method gives consistent improvements over a standard logistic regression model, and is more effective than conventional induction schemes for experts.

Christopher D. Manning | Mengqiu Wang

[1] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[2] Tom Heskes,et al. Selecting Weighting Factors in Logarithmic Opinion Pools , 1997, NIPS.

[3] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[4] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[5] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[6] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[7] Noah A. Smith,et al. Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties , 2012, NAACL.

[8] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[9] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[10] Andrew McCallum,et al. Reducing Weight Undertraining in Structured Discriminative Learning , 2006, NAACL.

[11] Walter Daelemans,et al. Applying System Combination to Base Noun Phrase Identification , 2000, COLING.