Grammarless extraction of phrasal translation examples from parallel texts

We describe a method for identifying subsentential phrasal translation examples in sentencealigned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction.

[1]  Pascale Fung,et al.  Statistical Augmentation of a Chinese Machine-Readable Dictionary , 1994, ArXiv.

[2]  Ranan B. Banerji,et al.  Artificial and human intelligence : edited review papers presented at the International NATO Symposium on Artificial and Human Intelligence , 1984 .

[3]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[4]  Victor Sadler,et al.  Pilot Implementation of a Bilingual Knowledge Bank , 1990, COLING.

[5]  Yuji Matsumoto,et al.  Sructural Matching of Parallel Texts , 1993, ACL.

[6]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[7]  Stelios Piperidis,et al.  A Matching Technique in Example-Based Machine Translation , 1994, COLING.

[8]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[9]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[10]  Dekai Wu,et al.  Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[11]  Kenneth Ward Church,et al.  K-vec: A New Approach for Aligning Parallel Texts , 1994, COLING.

[12]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[13]  Frank Smadja How to Compile a Bilingual Collocational Lexicon . Automatically , 1992 .

[14]  Stanley F. Chen,et al.  Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[15]  Kenneth Ward Church Char_align: A Program for Aligning Parallel Texts at the Character Level , 1993, ACL.

[16]  Dekai Wu,et al.  Learning an English-Chinese Lexicon from a Parallel Corpus , 1994, AMTA.

[17]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[18]  Pascale Fung,et al.  Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping , 1994, AMTA.

[19]  Hiroyuki Kaji,et al.  Learning Translation Templates From Bilingual Text , 1992, COLING.

[20]  Pascale Fung,et al.  Improving Chinese Tokenization With Linguistic Filters On Statistical Lexical Acquisition , 1994, ANLP.

[21]  Kenneth Ward Church,et al.  Robust Bilingual Word Alignment for Machine Aided Translation , 1993, VLC@ACL.