Cross-Language Parser Adaptation between Related Languages

The present paper describes an approach to adapting a parser to a new language. Presumably the target language is much poorer in linguistic resources than the source language. The technique has been tested on two European languages due to test data availability; however, it is easily applicable to any pair of sufficiently related languages, including some of the Indic language group. Our adaptation technique using existing annotations in the source language achieves performance equivalent to that obtained by training on 1546 trees in the target language.

[1]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[2]  S. E. Brodie New York, New York, USA , 1996 .

[3]  Ralph Arnote,et al.  Hong Kong (China) , 1996, OECD/G20 Base Erosion and Profit Shifting Project.

[4]  Jan Hajic Disambiguation of Rich Inflection - Computational Morphology of Czech , 2004 .

[5]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[6]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[9]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[10]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[11]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[14]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[15]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[16]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[17]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[18]  Rens Bod,et al.  Unsupervised Parsing with U-DOP , 2006, CoNLL.

[19]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.