Methods for automatic term recognition in domain-specific text collections: A survey

Applications related to domain specific text processing often use glossaries and ontologies, and the main step of such resource construction is term recognition. This paper presents a survey of existing definitions of the term and its linguistic features, formulates the task definition for term recognition, and analyzes presently-available methods for automatic term recognition, such as methods for candidates collection, methods based on statistics and contexts of term occurrences, methods using topic models, and methods based on external resources (such as text collections from other domains, ontologies, and Wikipedia). This paper also provides an overview of standard methodologies and datasets for experimental research.

[1]  Rosa Estopà Les unités de signification spécialisées élargissant l'objet du travail en terminologie , 2001 .

[2]  Hao Yu,et al.  Fault-Tolerant Learning for Term Extraction , 2010, PACLIC.

[3]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[4]  Flavius Frasincar,et al.  A semantic approach for extracting domain taxonomies from text , 2014, Decis. Support Syst..

[5]  G. FedorenkoD.,et al.  AutomAtic EnrichmEnt of informAl ontology by AnAlyzing , 2014 .

[6]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[7]  Rosa Estopà Bagot,et al.  Les unités de signification spécialisées élargissant l’objet du travail en terminologie , 2001 .

[8]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Hinrich Schütze,et al.  Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents , 2014, COLING.

[11]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[12]  Feiyu Xu,et al.  A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping , 2002, LREC.

[13]  Victor Sadler,et al.  Book Reviews: Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon , 1993, CL.

[14]  Udo Hahn,et al.  You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction , 2006, ACL.

[15]  Horacio Rodríguez,et al.  Using Wikipedia for term extraction in the biomedical domain: first experiences , 2010, Proces. del Leng. Natural.

[16]  Gabriel Bernier-Colborne,et al.  Creating a test corpus for term extractors through term annotation. , 2014 .

[17]  Natalia V. Loukachevitch,et al.  Multiple Evidence for Term Extraction in Broad Domains , 2011, RANLP.

[18]  Julio Gonzalo,et al.  Corpus-based terminology extraction applied to information access , 2001 .

[19]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[20]  Alberto Barrón-Cedeño,et al.  An Improved Automatic Term Recognition Method for Spanish , 2009, CICLing.

[21]  Beatrice Daille,et al.  Combined approach for terminology extraction: lexical statistics and linguistic filtering , 1995 .

[22]  Xiaoyue Liu,et al.  An Improved Corpus Comparison Approach to Domain Specific Term Recognition , 2008, PACLIC.

[23]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[24]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[25]  Paola Velardi,et al.  Semantic Interpretation of Terminological Strings , 2002 .

[26]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[27]  Maurizio Marchese,et al.  Large Dataset for Keyphrases Extraction , 2009 .

[28]  Anne Marsden,et al.  International Organization for Standardization , 2014 .

[29]  P. Langlais Corpus-Based Terminology Extraction , 2005 .

[30]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[31]  H Felber Basic Principles and Methods for the Preparation of Terminology Standards , 1983 .

[32]  Baobao Chang,et al.  A novel topic model for automatic term extraction , 2013, SIGIR.

[33]  Maguelonne Teisseire,et al.  Combining C-value and Keyword Extraction Methods for Biomedical Terms Extraction , 2013 .

[34]  Natalia Loukachevitch,et al.  An Experimental Study of Term Extraction for Real Information-Retrieval Thesauri , 2013 .

[35]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[36]  Fabio Massimo Zanzotto,et al.  Terminology Extraction: An Analysis of Linguistic and Statistical Approaches , 2005 .

[37]  Georgeta Bordea,et al.  Domain adaptive extraction of topical hierarchies for Expertise Mining , 2013 .

[38]  Ian H. Witten,et al.  Domain-independent automatic keyphrase indexing with small training sets , 2008, J. Assoc. Inf. Sci. Technol..

[39]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[40]  Branimir Boguraev,et al.  Automatic Glossary Extraction: Beyond Terminology Identification , 2002, COLING.

[41]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[42]  Yurdaer N. Doganata,et al.  Glossary extraction and utilization in the information search and delivery system for IBM Technical Support , 2004, IBM Syst. J..

[43]  Jennifer Pearson,et al.  Terms in context , 1998 .

[44]  Lee Gillam,et al.  University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER) , 1999, TREC.

[45]  Michael Nokel,et al.  Topic Models Can Improve Domain Term Extraction , 2013, ECIR.

[46]  Nikita Astrakhantsev,et al.  Automatic construction and enrichment of informal ontologies: A survey , 2013, Programming and Computer Software.

[47]  Nikita Astrakhantsev,et al.  Automatic recognition of domain-specific terms: an experimental evaluation , 2013, SYRCoDIS.

[48]  Gerardo Sierra,et al.  Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks , 2012, LREC.

[49]  A Gelbukh An Improved Automatic Term Recognition method for Spanish , 2009, CICLing 2009.

[50]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities , 2007, IESA.

[51]  Goran Nenadic,et al.  Enhancing automatic term recognition through recognition of variation , 2004, COLING.

[52]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[53]  Xiaoyong Du,et al.  Extracting Domain-Relevant Term Using Wikipedia Based on Random Walk Model , 2012, 2012 Seventh ChinaGrid Annual Conference.

[54]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[55]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.