Reinforcing Parser Preferences through Tagging

Lexical ambiguity is an important source of inefciency for wide-coverage HPSG parsing. In this paper, we propose a lexical analysis lter which removes unlikely lexical cat- egories. The lter is implemented as a straightforward HMM n-gram POS-tagger, which com- putes the 'a posteriori' probability of each lexical category. A lexical category is removed if a competing lexical category is sufciently more likely. The novel aspect of our approach is the fact that the tagger is trained on the output of the parser itself; therefore there is no need for hand-annotated material. Use of this lter increases the speed of the parser considerably, and in addition gives rise to an improvement in parsing accuracy. R · ESUM · E. L'ambigu¤ · e lexicale est une source importante de l'inefcacit · e de l'analyse syn- ecision.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[3]  Gertjan van Noord,et al.  Unsupervised POS-Tagging Improves Parsing Accuracy and Parsing Efficiency , 2001, IWPT.

[4]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Gertjan van Noord,et al.  Word order constraints on verb clusters in German and Dutch , 1998 .

[6]  Oliver Wauschkuhn,et al.  The Influence of Tagging on the Results of Partial Parsing in German Corpora , 1995, IWPT.

[7]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[8]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Glenn Carroll,et al.  Taggers for Parsers , 1996, Artif. Intell..

[11]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[12]  Ineke Schuurman,et al.  Computational Linguistics in the Netherlands 1998 , 1999 .

[13]  Gertjan van Noord,et al.  Adjuncts and the Processing of Lexical Rules , 1994, COLING.

[14]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[15]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[16]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[17]  Mark-Jan Nederhof,et al.  Robust grammatical analysis for spoken dialogue systems , 1999, Natural Language Engineering.

[18]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[19]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.

[20]  Gosse Bouma,et al.  Satisfying Constraints on Extraction andAdjunction , 2001 .

[21]  Ted Briscoe,et al.  Apportioning Development Effort in a Probabilistic LR Parsing System Through Evaluation , 1996, EMNLP.

[22]  I. Sag English relative clause constructions , 1997, Journal of Linguistics.

[23]  Werkgroep Frequentie-onderzoek van het Nederlands,et al.  Woordfrequenties in geschreven en gesproken Nederlands , 1975 .

[24]  Gertjan van Noord Robust Parsing of Word Graphs , 2001 .

[25]  Atro Voutilainen Does tagging help parsing? A case study on finite state parsing , 1998 .

[26]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[27]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[28]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[29]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[30]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[31]  Khalil Sima'an,et al.  Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System , 1999, ArXiv.

[32]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[33]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.