Constructing Chinese Abbreviation Dictionary: A Stacked Approach

Abbreviation is a common linguistic phenomenon with wide popularity and high rate of growth. Correctly linking full forms to their abbreviations will be helpful in many applications. For example, it can improve the recall of information retrieval systems. An intuition to solve this is to build an abbreviation dictionary in advance. This paper investigates an automatic abbreviation generation method, which uses a stacked approach for Chinese abbreviation generation. We tackle this problem in two stages. First we use a sequence labeling method to generate a list of candidate abbreviations. Then, we try to use search engine to incorporate web data to re-rank the candidates, and finally get the best candidate. We use a Chinese abbreviation corpus which contains 8015 abbreviation pairs to evaluate the performance. Experiments revealed that our method gave better performance than the baseline methods.

[1]  Silviu Cucerzan,et al.  Acronym-Expansion Recognition and Ranking on the Web , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[2]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[3]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[5]  Sophia Ananiadou,et al.  A Machine Learning Approach to Acronym Generation , 2005, LBLODMBS@IDMB.

[6]  Charles P. Bourne,et al.  A Study of Methods for Systematically Abbreviating English Words and Names , 1961, JACM.

[7]  Kazem Taghva,et al.  Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.

[8]  Lei Liu,et al.  Automatic Expansion of Chinese Abbreviations by Web Mining , 2009, AICI.

[9]  Xu Sun,et al.  Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information , 2009, ACL/IJCNLP.

[10]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[11]  Toshihisa Takagi,et al.  Research Paper: ALICE: An Algorithm to Extract Abbreviations from MEDLINE , 2005, J. Am. Medical Informatics Assoc..

[12]  Yaakov HaCohen-Kerner,et al.  Combined One Sense Disambiguation of Abbreviations , 2008, ACL.

[13]  Hong Yu,et al.  A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations , 2006, TOIS.

[14]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[15]  Mandalay Grems,et al.  Abbreviating words systematically , 1960, Commun. ACM.

[16]  Serguei V. S. Pakhomov Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts , 2002, ACL.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  H R Garner,et al.  Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition Dictionaries , 2002, Methods of Information in Medicine.

[19]  Goran Nenadic,et al.  Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts , 2002, LREC.

[20]  Xu Sun,et al.  Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression , 2008, Journal of Computer Science and Technology.

[21]  Youngja Park,et al.  Hybrid Text Mining for Finding Abbreviations and their Definitions , 2001, EMNLP.