A Linguistic Model for Terminology Extraction based Conditional Random Fields

In this paper, we show the possibility of using a linear Conditional Random Fields (CRF) model for terminology extraction from a specialized text corpus. Also, we prove the ability of a Conditional Random Field to model linguistic knowledge by incorporating grammatical observations in the CRF's features. Keywords-Terminology Extraction; Term; CRF model; Linguistic knowledge; Features

[1]  Helmut Felber,et al.  Manuel de terminologie , 1987 .

[2]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Matthieu Constant,et al.  Intégrer des connaissances linguistiques dans un CRF : application à l'apprentissage d'un segmenteur-étiqueteu r du français , 2011 .

[7]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[8]  Mathieu Roche,et al.  EXIT : Un système itératif pour l'extraction de la terminologie du domaine à partir de corpus spécialisés , 2004 .

[9]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[10]  Guohong Fu,et al.  Chinese named entity recognition using lexicalized HMMs , 2005, SKDD.

[11]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[12]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[13]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[14]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Didier Bourigault,et al.  LEXTER, a Natural Language Processing Tool for Terminology Extraction , 1996 .

[17]  Vincent Claveau Acquisition automatique de lexiques sémantiques pour la recherche d'information. (Automatic acquisition of semantic lexicons for information retrieval) , 2003 .

[18]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .