Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus

In the present paper, we present an automated tagging approach aimed at enhancing a well-known resource, the ACL Anthology Reference Corpus, with semantic class labels for more than 20,000 technical terms that are relevant to the domain of computational linguistics. We use state-of-the-art classification techniques to assign semantic class labels to technical terms extracted from several reference term lists. We also sketch a set of research questions and approaches directed towards the integrated analysis of scientific corpora. To this end, we query the data set resulting from our annotation effort on both the term and the semantic class level level.

[1]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[2]  Maurizio Vincini,et al.  MELIS - An Incremental Method for the Lexical Annotation of Domain Ontologies , 2007, WEBIST.

[3]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[4]  Adrian Popescu,et al.  Gazetiki: automatic creation of a geographical gazetteer , 2008, JCDL '08.

[5]  Andreas Niekler,et al.  Modeling the dynamics of domain specific terminology in diachronic corpora , 2017, ArXiv.

[6]  Behrang Q. Zadeh,et al.  Tracing Research Paradigm Change Using Terminological Methods. A Case Study on "Machine Translation" in the ACL Anthology Reference Corpus , 2015, TIA.

[7]  German Rigau,et al.  WordNet Enrichment with Classification Systems , 2007 .

[8]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[9]  Steffen Staab,et al.  Word classification based on combined measures of distributional and semantic similarity , 2003, EACL.

[10]  Patrick Paroubek,et al.  Predictive Modeling: Guessing the NLP Terms of Tomorrow , 2016, LREC.

[11]  Maria Ruiz-Casado,et al.  Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia , 2007, Data Knowl. Eng..

[12]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[13]  Olga Babko-Malaya,et al.  Forecasting Technology Emergence from Metadata and Language of Scientific Publications and Patents , 2015, ISSI.

[14]  L. Fleck Entstehung und Entwicklung einer wissenschaftlichen Tatsache : Einführung in die Lehre vom Denkstil und Denkkollektiv , 1980 .

[15]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[16]  Behrang Q. Zadeh,et al.  The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods , 2016, LREC.

[17]  Siegfried Handschuh,et al.  The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics , 2014 .

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Christopher D. Manning,et al.  Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers , 2011, IJCNLP.

[20]  Anne-Kathrin Schumann The ACL RD-TEC Annotation Guideline A Reference Dataset for the Evaluation of Automatic Term Recognition and Classification Version 2 . 6 , 2015 .

[21]  Anne-Kathrin Schumann,et al.  Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information , 2016, LaTeCH@ACL.

[22]  Patrick Paroubek,et al.  Rediscovering 15 Years of Discoveries in Language Resources and Evaluation: The LREC Anthology Analysis , 2014, LREC.

[23]  Gabriela Vulcu,et al.  Forecasting Emerging Trends from Scientific Literature , 2016, LREC.

[24]  Deborah L. McGuinness,et al.  Towards explanation of scientific and technological emergence , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[25]  Michael J. Moravcsik,et al.  Science: Growth and Change , 1972 .