Investigating GIS and Smoothing for Maximum Entropy Taggers

This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar.

[1]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[2]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[3]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[4]  Joshua Goodman,et al.  Sequential Conditional Generalized Iterative Scaling , 2002, ACL.

[5]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[6]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[9]  Rob Koeling Chunking with Maximum Entropy Models , 2000, CoNLL/LLL.

[10]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[11]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[12]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[13]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[15]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[16]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[17]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[18]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[19]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[20]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.