Collocation translation based on sentence alignment and parsing

To date, substantial efforts have been devoted to the extraction of collocations from text corpora. However, only a few works deal with the subsequent processing of results in order for these to be successfully integrated into the NLP applications that could benefit from them (e.g., machine translation). This paper presents an accurate method for identifying translation equivalents of collocations in parallel text, whose main strengths are that : it can handle flexible (not only rigid) collocations; it only requires limited resources and computa- tion (no full alignment, no training needed); it deals with several language pairs, and it can even work when no bilingual dictionary is available. The method relies heavily on syntactic in- formation provided by the Fips multilingual parser. Evaluation performed on 4000 verb-object collocations for different language pairs showed an average accuracy of 89.8% and a reasonable coverage (70.9%). These figures are higher that those reported in the evaluation of related work in collocation translation. Mots-cles : traduction de collocations, extraction de collocations, parsing, alignement de textes.

[1]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[2]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[3]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[4]  Eric Wehrli,et al.  TwicPen: Hand-held Scanner and Translation Software for non-Native Readers , 2006, Annual Meeting of the Association for Computational Linguistics.

[5]  Leo Wanner,et al.  Making sense of collocations , 2006, Comput. Speech Lang..

[6]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[7]  Geoffrey Williams In search of representativity in specialised corpora: Categorisation through collocation , 2002 .

[8]  Pim van der Eijk Automating the Acquisition of Bilingual Terminology , 1993, EACL.

[9]  Ming Zhou,et al.  Synonymous Collocation Extraction Using Translation Information , 2003, ACL.

[10]  Mike Dillinger,et al.  Collocation Extraction for Machine Translation , 2003 .

[11]  Ming Zhou,et al.  Collocation Translation Acquisition Using Monolingual Corpora , 2004, ACL.

[12]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13]  Ulrich Heid,et al.  Collocations in Multilingual Generation , 1989, EACL.

[14]  Eric Wehrli,et al.  Creating a multilingual collocations dictionary from large text corpora , 2003, EACL.

[15]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[16]  Eric Wehrli,et al.  Multilingual Collocation Extraction: Issues and Solutions , 2006 .

[17]  Eric Wehrli,et al.  A tool for multi-word collocation extraction and visualization in multilingual corpora , 2004 .

[18]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.