Alpino: Wide-coverage Computational Analysis of Dutch

Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect of wide-coverage parsing is robustness and disambiguation. The dependency relations encoded in the dependency structures have been used to develop and evaluate both hand-coded and statistical disambiguation methods.

[1]  Gertjan van Noord Robust Parsing of Word Graphs , 2001 .

[2]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[3]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[5]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[6]  Mark-Jan Nederhof,et al.  Robust grammatical analysis for spoken dialogue systems , 1999, Natural Language Engineering.

[7]  I. Sag English relative clause constructions , 1997, Journal of Linguistics.

[8]  Gertjan van Noord,et al.  Word order constraints on verb clusters in German and Dutch , 1998 .

[9]  Werkgroep Frequentie-onderzoek van het Nederlands,et al.  Woordfrequenties in geschreven en gesproken Nederlands , 1975 .

[10]  Gosse Bouma,et al.  Satisfying Constraints on Extraction andAdjunction , 2001 .

[11]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[12]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[13]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[14]  Gertjan van Noord,et al.  Adjuncts and the Processing of Lexical Rules , 1994, COLING.

[15]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[16]  Michael Moortgat,et al.  Syntactische annotatie voor het Corpus Gesproken Nederlands (CGN) , 2002 .

[17]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  David M. Carter,et al.  The TreeBanker: a Tool for Supervised Training of Parsed Corpora , 1997, ArXiv.

[19]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[20]  Jonathan Calder,et al.  Thistle and Interarbora , 2000, COLING.

[21]  Khalil Sima'an,et al.  Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System , 1999, ArXiv.

[22]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.