Multiple Evidence for Term Extraction in Broad Domains

The paper describes the method of extraction of two-word domain terms combining their features. The features are computed from three sources: the occurrence statistics in a domain-specific text collection, the statistics of global search engines, and a domainspecific thesaurus. The evaluation of the approach is based on manually created thesauri. We show that the use of multiple features considerably improves the automatic extraction of domain-specific terms. We compare the quality of the proposed method in two different domains.

[1]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[2]  Sophia Ananiadou,et al.  Identifying Terms by their Family and Friends , 2000, COLING.

[3]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[6]  Staðlaráð Íslands,et al.  Heimildaskráning : leiðbeiningar um gerð og þróun kerfisbundinna efnisorðaskráa á einu tungumáli = Documentation : guidelines for the establishment and development of monolingual thesauri , 1991 .

[7]  Satoshi Sato,et al.  Automatic Collection of Related Terms from the Web , 2003, ACL.

[8]  Julio Gonzalo,et al.  Corpus-based terminology extraction applied to information access , 2001 .

[9]  Pavel Pecina,et al.  Combining Association Measures for Collocation Extraction , 2006, ACL.

[10]  Natalia V. Loukachevitch,et al.  Development of Linguistic Ontology on Natural Sciences and Technology , 2006, LREC.

[11]  Magnus Merkel,et al.  Using machine learning to perform automatic term recognition , 2010 .

[12]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Wei Liu,et al.  Determination of Unithood and Termhood for Term Recognition , 2009 .

[15]  Goran Nenadic,et al.  Enhancing automatic term recognition through recognition of variation , 2004, COLING.

[16]  Horacio Rodríguez,et al.  Improving Term Extraction by System Combination Using Boosting , 2001, ECML.