论文信息 - Alignement de textes bilingues par classification ascendante hiérarchique

Alignement de textes bilingues par classification ascendante hiérarchique

Existing translations contain a wealth of ready-made solutions that can be reused to generate new high-quality translations. For this reason, translation resources are frequently stored in electronic databases providing certain information retrieval facilities. The c oncept of bilingual t ext alignment enables a more e fficient use of the translation resources, by reconstructing the links maintaining translation equivalence between the c orresponding segments of the text and its translations in different languages. Current t ext alignment algorithms perform quite successfully on a sentence level. However, there is a need to continue research in finer-grained text alignment. In this regard, we propose to identify translation correspondences on the basis of hierarchical cluster analysis of graphical forms and repeated segments of bilingual texts. The principles of this technique enable to yield, through progressive agglomeration, clusters of textual units with similar ( or identical) distributional profiles. The results obtained following this technique suggest that hierarchical cluster analysis can be applied for a wide rage of purposes in bilingual text alignment.

Maria Zimina

[1] B. Harris. Bi-text, a new concept in translation theory , 1988 .

[2] J. Juan. Le programme HIVOR de classification ascendante hiérarchique selon les voisins réciproques et le critère de la variance , 1982 .

[3] Didier Bourigault,et al. Construction d'un lexique bilingue des droits de l'homme à partir de l'analyse automatique d'un corpus aligné , 1999 .

[4] P. Isabelle. La bi-textualité : vers une nouvelle génération d’aides à la traduction et la terminologie , 1992 .