Acronym extraction using SVM with Uneven Margins

Extracting acronyms and their expansions from plain text is an important problem in text mining. Previous research shows that the problem can be solved via machine learning approaches. That is, converting the problem of acronym extraction to binary classification. We investigate the classification problem and find that the classes are highly unbalanced (the positive instances are very rare compared to negative ones). So we try to tackle the problem using an uneven margin classifier - SVM with Uneven Margins. Experimental results showed that our approach can get better results than baseline methods of using heuristic rules and conventional SVM models. Experimental results also showed how uneven margins classifier made the tradeoff between the precision and recall of extraction.

[1]  Toshihisa Takagi,et al.  PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary , 2000, Bioinform..

[2]  Peter D. Turney,et al.  A Supervised Learning Approach to Acronym Identification , 2005, Canadian AI.

[3]  James Pustejovsky,et al.  Automatic Extraction of Acronym-meaning Pairs from MEDLINE Databases , 2001, MedInfo.

[4]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[5]  Jun Xu,et al.  A machine learning approach to recognizing acronyms and their expansion , 2005 .

[6]  George Hripcsak,et al.  Mapping abbreviations to full forms in biomedical articles. , 2002, Journal of the American Medical Informatics Association : JAMIA.

[7]  Kazem Taghva,et al.  Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.

[8]  Youngja Park,et al.  Hybrid Text Mining for Finding Abbreviations and their Definitions , 2001, EMNLP.

[9]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[10]  Stuart Yeates,et al.  Automatic Extraction of Acronyms from Text , 1999, New Zealand Computer Science Research Students' Conference.

[11]  Ian H. Witten,et al.  Using compression to identify acronyms in text , 2000, Proceedings DCC 2000. Data Compression Conference.

[12]  Paul Ogilvie,et al.  Acrophile: an automated acronym extractor and server , 2000, DL '00.

[13]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[14]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.