论文信息 - Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries

Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries

Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text Experimental results of alignment at paragraph level are described.

Alexander F. Gelbukh | Grigori Sidorov | José Ángel Vera-Félix

[1] Louise Guthrie,et al. Lexical Disambiguation using Simulated Annealing , 1992, HLT.

[2] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[3] Alexander F. Gelbukh,et al. Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort , 2003, CICLing.

[4] Robert L. Mercer,et al. Aligning Sentences in Parallel Corpora , 1991, ACL.

[5] Alexander F. Gelbukh,et al. On Some Optimization Heuristics for Lesk-Like WSD Algorithms , 2005, NLDB.

[6] Mikhail Mikhailov. TWO APPROACHES TO AUTOMATED TEXT ALIGNING OF PARALLEL FICTION TEXTS , 2001 .

[7] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[8] Stanley F. Chen,et al. Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[9] Chunyu Kit,et al. Clause alignment for Hong Kong legal texts: A lexical-based approach , 2004 .

[10] Ralph Grishman,et al. A Multilingual Procedure for Dictionary-Based Sentence Alignment , 1998, AMTA.

[11] Jean Véronis,et al. Methods and Practical Issues in Evaluating Alignment Techniques , 1998, COLING-ACL.

[13] Maosong Sun,et al. Automatic Image Annotation Based on WordNet and Hierarchical Ensembles , 2006, CICLing.