Cross-Language Mining for Acronyms and Their Completions from the Web

We propose a method that aligns biomedical acronyms and their long-form definitions across different languages. We use a freely available search and extraction tool by which abbreviations, together with their fully expanded forms, are massively mined from the Web. In a subsequent step, language-specific variants, synonyms, and translations of the extracted acronym definitions are normalized by referring to a language-independent, shared semantic interlingua.

[1]  James Pustejovsky,et al.  Biomedical term mapping databases , 2004, Nucleic Acids Res..

[2]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[3]  Vimla L. Patel,et al.  MEDINFO 2001 - Proceedings of the 10th World Congress on Medical Informatics, September 2-5, 2001, London, UK , 2001, MedInfo.

[4]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[5]  Stefan Schulz,et al.  Morpheme-based, cross-lingual indexing for medical document retrieval , 2000, Int. J. Medical Informatics.

[6]  James Pustejovsky,et al.  Automatic Extraction of Acronym-meaning Pairs from MEDLINE Databases , 2001, MedInfo.

[7]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[8]  Stefan Schulz,et al.  Bootstrapping dictionaries for cross-language information retrieval , 2005, SIGIR '05.

[9]  Raymond C Rowe Abbreviation mania and acronymical madness. , 2003, Drug discovery today.

[10]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  G Hripcsak,et al.  Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  Stefan Schulz,et al.  Interlingual Indexing across Different Languages , 2004, RIAO.

[14]  George Hripcsak,et al.  Mapping abbreviations to full forms in biomedical articles. , 2002, Journal of the American Medical Informatics Association : JAMIA.

[15]  Michael O'Connell,et al.  BioABACUS: a database of abbreviations and acronyms in biotechnology and computer science , 1998, Bioinform..

[16]  Hongfang Liu,et al.  A study of abbreviations in MEDLINE abstracts , 2002, AMIA.

[17]  Stefan Schulz,et al.  Cognate Mapping - A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon , 2004, COLING.

[18]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[19]  Goran Nenadic,et al.  Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts , 2002, LREC.

[20]  Stefan Schulz,et al.  Cross-language MeSH Indexing using Morpho-Semantic Normalization , 2003, AMIA.

[21]  Kazem Taghva,et al.  Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.

[22]  Hongfang Liu,et al.  Mining Terminological Knowledge in Large Biomedical Corpora , 2003, Pacific Symposium on Biocomputing.

[23]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[24]  H R Garner,et al.  Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition Dictionaries , 2002, Methods of Information in Medicine.