ICE-TEA: In-Context Expansion and Translation of English Abbreviations

The wide use of abbreviations in modern texts poses interesting challenges and opportunities in the field of NLP. In addition to their dynamic nature, abbreviations are highly polysemous with respect to regular words. Technologies that exhibit some level of language understanding may be adversely impacted by the presence of abbreviations. This paper addresses two related problems: (1) expansion of abbreviations given a context, and (2) translation of sentences with abbreviations. First, an efficient retrieval-based method for English abbreviation expansion is presented. Then, a hybrid system is used to pick among simple abbreviation-translation methods. The hybrid system achieves an improvement of 1.48 BLEU points over the baseline MT system, using sentences that contain abbreviations as a test set.

[1]  David Yarowsky,et al.  Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora , 2008, ACL.

[2]  Yalou Huang,et al.  Using SVM to Extract Acronyms from Text , 2006, Soft Comput..

[3]  Patrick Brézillon,et al.  Modeling and Using Context , 1999, Lecture Notes in Computer Science.

[4]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[5]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[6]  Chris Quirk,et al.  Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction , 2007 .

[7]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[8]  Serguei V. S. Pakhomov Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts , 2002, ACL.

[9]  Toshihisa Takagi,et al.  Research Paper: ALICE: An Algorithm to Extract Abbreviations from MEDLINE , 2005, J. Am. Medical Informatics Assoc..

[10]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[11]  Mathieu Roche,et al.  AcroDef : A Quality Measure for Discriminating Expansions of Ambiguous Acronyms , 2007, CONTEXT.

[12]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[13]  Manuel Zahariev Automatic sense disambiguation for acronyms , 2004, SIGIR '04.

[14]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[15]  Mathieu Roche,et al.  Managing the Acronym / Expansion Identification , 2008 .

[16]  Eneko Agirre,et al.  Smoothing and Word Sense Disambiguation , 2004, EsTAL.

[17]  Dietrich Rebholz-Schuhmann,et al.  BIOINFORMATICS ORIGINAL PAPER Data and text mining Resolving abbreviations to their senses in Medline , 2005 .

[18]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[19]  Chris Quirk,et al.  Syntactic Models for Structural Word Insertion and Deletion during Translation , 2008, EMNLP.

[20]  Stuart Yeates,et al.  Automatic Extraction of Acronyms from Text , 1999, New Zealand Computer Science Research Students' Conference.

[21]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[22]  Mark Stevenson,et al.  Disambiguation of Biomedical Abbreviations , 2009, BioNLP@HLT-NAACL.

[23]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[24]  Paul Ogilvie,et al.  Acrophile: an automated acronym extractor and server , 2000, DL '00.

[25]  David Yarowsky,et al.  Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora , 2008, EMNLP.