AGRA: An Analysis-Generation-Ranking Framework for Automatic Abbreviation from Paper Titles

People sometimes choose word-like abbreviations to refer to items with a long description. These abbreviations usually come from the descriptive text of the item and are easy to remember and pronounce, while preserving the key idea of the item. Coming up with a nice abbreviation is not an easy job, even for human. Previous assistant naming systems compose names by applying hand-written rules, which may not perform well. In this paper, we propose to view the naming task as an artificial intelligence problem and create a data set in the domain of academic naming. To generate more delicate names, we propose a three-step framework, including description analysis, candidate generation and abbreviation ranking, each of which is parameterized and optimizable. We conduct experiments to compare different settings of our framework with several analysis approaches from different perspectives. Compared to online or baseline systems, our framework could achieve the best results.

[1]  Clifford A. Shaffer,et al.  WBCSim: A Prototype Problem Solving Environment for Wood-Based Composites Simulations , 1998, Engineering with Computers.

[2]  Dong Yang,et al.  Automatic Chinese Abbreviation Generation Using Conditional Random Field , 2009, NAACL.

[3]  C. Strapparava,et al.  A Computational Approach to the Automation of Creative Naming , 2012, ACL.

[4]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[10]  Chris C. N. Chu,et al.  IPR: An Integrated Placement and Routing Algorithm , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[11]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Ruben Martinez-Cantin,et al.  BayesOpt: a Bayesian optimization library for nonlinear optimization, experimental design and bandits , 2014, J. Mach. Learn. Res..

[14]  Yang Yu,et al.  Sequential Classification-Based Optimization for Direct Policy Search , 2017, AAAI.

[15]  Arjohn Kampman,et al.  SeRQL: A Second Generation RDF Query Language , 2003 .

[16]  Elsevier Sdol,et al.  Computer Speech & Language , 2009 .

[17]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[18]  Jing-Shin Chang,et al.  A Preliminary Study on Probabilistic Models for Chinese Abbreviations , 2004, SIGHAN@ACL.

[19]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .