Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles

We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized LR dependency parsing. Parser actions are determined by a classifier, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task, in each case taking advantage of multiple models trained with different learners. In the multilingual track, we train three LR models for each of the ten languages, and combine the analyses obtained with each individual model with a maximum spanning tree voting scheme. In the domain adaptation track, we use two models to parse unlabeled data in the target domain to supplement the labeled out-ofdomain training set, in a scheme similar to one iteration of co-training.

[1]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[2]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[3]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[4]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[5]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[6]  Masaru Tomita,et al.  An Efficient Augmented-Context-Free Parsing Algorithm , 1987, Comput. Linguistics.

[7]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[8]  R. Brown,et al.  A First Language , 1973 .

[9]  Alon Lavie,et al.  A Best-First Probabilistic Shift-Reduce Parser , 2006, ACL.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Jan Hajic,et al.  Prague Arabic Dependency Treebank: Development in Data and Tools , 2004 .

[12]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[13]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[14]  Stelios Piperidis,et al.  Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank , 2005 .

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[17]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[18]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[19]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[20]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[21]  Díaz de Ilarraza Construction of a Basque Dependency Treebank , 2003 .

[22]  Masaru Tomita,et al.  The Generalized LR Parser/Compiler V8-4: A Software Package for Practical NL Projects , 1990, COLING.

[23]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[24]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[25]  Jun'ichi Tsujii,et al.  Evaluation and Extension of Maximum Entropy Models with Inequality Constraints , 2003, EMNLP.

[26]  Lluís Màrquez i Villodre,et al.  Anotación semiautomática con papeles temáticos de los corpus CESS-ECE , 2007, Proces. del Leng. Natural.

[27]  Chu-Ren Huang,et al.  Sinica Treebank: Design Criteria, Representational Issues and Implementation , 2004 .

[28]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[29]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[30]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[31]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[32]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..