Learning Bilingual Collocations by Word-Level Sorting

This paper proposes a new method for learning bilingual collocations from sentence-aligned parallel corpora. Our method comprises two steps: (1) extracting useful word chunks (n-grams) by word-level sorting and (2) constructing bilingual collocations by combining the word-chunks acquired in stage (1). We apply the method to a very challenging text pair: a stock market bulletin in Japanese and its abstract in English. Domain specific collocations are well captured even if they were not contained in the dictionaries of economic terms.

[1]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[2]  Yuji Matsumoto,et al.  Sructural Matching of Parallel Texts , 1993, ACL.

[3]  Hideki Hirakawa,et al.  Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information , 1994, COLING.

[4]  SmadjaFrank Retrieving collocations from text , 1993 .

[5]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[6]  Satoshi Shirai,et al.  A Statistical Method for Extracting Uninterrupted and Interrupted Collocations from Very Large Corpora , 1996, COLING.

[7]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[8]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[9]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[10]  Makoto Nagao,et al.  A New Method of N-gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese , 1994, COLING.

[11]  Masahiko Haruno,et al.  High-performance bilingual text alignment using statistical and dictionary information , 1997, Nat. Lang. Eng..

[12]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[13]  Pascale Pung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL 1995.

[14]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[15]  Masahiko Haruno,et al.  High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information , 1996, ACL.

[16]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.