论文信息 - Parallel text alignment using crosslingual information retrieval techniques

Parallel text alignment using crosslingual information retrieval techniques

In this chapter, we demonstrate that aligning a sentence with its translation is not fundamentally different from finding a sentence on the same topic in the target corpus, using the source sentence as a query. The two processes are based on the semantic proximity of two sentences in different languages, and their major difference is that information retrieval only needs to insure that the sentence found contains most of the information of the query, whereas sentence alignment requires that the parts that are not common to both languages be as small as possible. A crosslingual query system can be used to obtain candidates for sentence alignment, provided that the measure of semantic proximity slightly modified. More classical techniques can be used, taking sequential order into account, but our approach is very robust to text desynchronization, such as missing text segments in one language, or texts such as glossaries or indexes that are not in the same order in different languages.

Christian Fluhr | Frédérique Bisson | Faiza Elkateb

[1] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[2] Gregory Grefenstette,et al. Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[3] Michel Simard,et al. Using cognates to align sentences in bilingual corpora , 1993, TMI.

[4] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[5] Éric Gaussier. Modeles statistiques et patrons morphosyntaxiques pour l'extraction de lexiques bilingues , 1995 .

[6] Robert L. Mercer,et al. Aligning Sentences in Parallel Corpora , 1991, ACL.

[7] Christian Fluhr,et al. About reformulation in full-text IRS , 1989, Inf. Process. Manag..