论文信息 - Improving Word Alignment Using Alignment of Deep Structures

Improving Word Alignment Using Alignment of Deep Structures

In this paper, we describe differences between a classical word alignment on the surface (word-layer alignment) and an alignment of deep syntactic sentence representations (tectogrammatical alignment). The deep structures we use are dependency trees containing content (autosemantic) words as their nodes. Most of other functional words, such as prepositions, articles, and auxiliary verbs are hidden. We introduce an algorithm which aligns such trees using perceptron-based scoring function. For evaluation purposes, a set of parallel sentences was manually aligned. We show that using statistical word alignment (GIZA ++ ) can improve the tectogrammatical alignment. Surprisingly, we also show that the tectogrammatical alignment can be then used to significantly improve the original word alignment.

David Marecek | D. Mareček

[1] Arul Menezes,et al. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[2] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[3] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[4] Sadao Kurohashi,et al. Finding Translation Patterns from Paired Source and Target Dependency Structures , 2003 .

[5] Fernando Pereira,et al. Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[6] Jan Hajic,et al. The Prague Dependency Treebank , 2003 .

[7] Ondrej Bojar,et al. CzEng 0.7: Parallel Corpus with Community-Supplied Translations , 2008, LREC.

[8] Ondrej Bojar,et al. Czech-English Word Alignment , 2006, LREC.

[9] Masahiko Haruno,et al. High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information , 1996, ACL.

[10] Petr Pajas,et al. TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer , 2008, WMT@ACL.

[11] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12] P. Sgall,et al. Generativní popis jazyka a česká deklinace , 1967 .