SMT Helps Bitext Dependency Parsing

We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-the-art baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.

[1]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[2]  Kentaro Torisawa,et al.  Bitext Dependency Parsing with Bilingual Subtree Constraints , 2010, ACL.

[3]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[4]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[5]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[6]  Kentaro Torisawa,et al.  Improving Dependency Parsing with Subtrees from Auto-Parsed Data , 2009, EMNLP.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[9]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[10]  Charles N. Li,et al.  Mandarin Chinese: A Functional Reference Grammar , 1989 .

[11]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[12]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[15]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[16]  Qun Liu,et al.  Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Yang Liu,et al.  Tree-Based and Forest-Based Translation , 2010, ACL 2010.

[19]  Hai Zhao,et al.  Cross Language Dependency Parsing using a Bilingual Lexicon , 2009, ACL.

[20]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[21]  John DeNero,et al.  Tailoring Word Alignments to Syntactic Machine Translation , 2007, ACL.

[22]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .