Feature Selection for a Rich HPSG Grammar Using Decision Trees

This paper examines feature selection for log linear models over rich constraint-based grammar (HPSG) representations by building decision trees over features in corresponding probabilistic context free grammars (PCFGs). We show that single decision trees do not make optimal use of the available information; constructed ensembles of decision trees based on different feature subspaces show significant performance gains (14% parse selection error reduction). We compare the performance of the learned PCFG grammars and log linear models over the same features.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[3]  Glenn Carroll,et al.  Context-Sensitive Statistics For Improved Grammatical Language Models , 1994, AAAI.

[4]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[5]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[8]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[9]  Raymond J. Mooney,et al.  Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[10]  Zijian Zheng,et al.  Naive Bayesian Classifier Committees , 1998, ECML.

[11]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[12]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[14]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[15]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[16]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[17]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[18]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[19]  Alex Lascarides,et al.  An Algebra for Semantic Construction in Constraint-based Grammars , 2001, ACL.

[20]  Thorsten Brants,et al.  The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[21]  Satoshi Shirai,et al.  Using Decision Trees to Construct a Practical Parser , 1999, COLING.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.