论文信息 - A New Combined Lexical and Statistical based Sentence Level Alignment Algorithm for Parallel Texts

A New Combined Lexical and Statistical based Sentence Level Alignment Algorithm for Parallel Texts

Parallel texts alignment is an active research area in Natural Language Processing field. In this paper, we propose a method for sentence alignment of parallel texts that is based both on lexical and statistical information. The alignment procedure uses dynamic programming technique. We made our experiments for Spanish and English texts. We use lexical information from bilingual Spanish-English dictionary, as well as the sentence length measured in words and in characters. The proposed method was tested on a corpus of fiction texts, where the frequency of multiple alignments, omissions and insertions is higher than in other types of texts. We obtained better results than the standard Vanilla aligner system that uses a purely statistical approach.

J.-P. Posadas | H. Jiménez

[1] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[2] Robert L. Mercer,et al. Aligning Sentences in Parallel Corpora , 1991, ACL.

[3] Michel Simard,et al. Using cognates to align sentences in bilingual corpora , 1993, TMI.

[4] Stanley F. Chen,et al. Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[5] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[6] Alexander F. Gelbukh,et al. Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming , 2006, CIARP.

[7] Mrityunjay Gautam,et al. A Hybrid Approach to Sentence Alignment Using Genetic Algorithm , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[8] Yang Liu,et al. Sub-Sentence Division for Tree-Based Machine Translation , 2009, ACL/IJCNLP.