Automatic recognition of domain-specific terms: an experimental evaluation

This paper presents an experimental evaluation of the state-of-the-art approaches for automatic term recognition based on multiple features: machine learning method and voting algorithm. We show that in most cases machine learning approach obtains the best results and needs little data for training; we also find the best subsets of all popular features.

[1]  P. Langlais Corpus-Based Terminology Extraction , 2005 .

[2]  Paola Velardi,et al.  Semantic Interpretation of Terminological Strings , 2002 .

[3]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[4]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[5]  Fabio Massimo Zanzotto,et al.  Terminology Extraction: An Analysis of Linguistic and Statistical Approaches , 2005 .

[6]  Lee Gillam,et al.  University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER) , 1999, TREC.

[7]  Natalia V. Loukachevitch,et al.  Multiple Evidence for Term Extraction in Broad Domains , 2011, RANLP.

[8]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9]  Julio Gonzalo,et al.  Corpus-based terminology extraction applied to information access , 2001 .

[10]  Xijin Tang,et al.  Using ontology to improve precision of terminology extraction from documents , 2009, Expert Syst. Appl..

[11]  Jody Foo jodfo Term extraction using machine learning , 2009 .

[12]  Hao Yu,et al.  Fault-Tolerant Learning for Term Extraction , 2010, PACLIC.

[13]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Lars Ahrenberg Term extraction : A Review Draft Version 091221 , 2009 .

[16]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[17]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[18]  Roberto Basili,et al.  Identification of Relevant Terms to Support the Construction of Domain Ontologies , 2001, HTLKM@ACL.

[19]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[20]  Branimir Boguraev,et al.  Automatic Glossary Extraction: Beyond Terminology Identification , 2002, COLING.

[21]  Michael Nokel,et al.  Combining multiple features for single- word term extraction , 2012 .

[22]  Sophia Ananiadou,et al.  Extracting Nested Collocations , 1996, COLING.

[23]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[24]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities , 2007, IESA.

[25]  Christian Jacquemin,et al.  EMPIRICAL OBSERVATION OF TERM VARIATIONS AND PRINCIPLES FOR THEIR DESCRIPTION , 1996 .