Multiword Term Extraction through Lexical Head Selection

We propose a semantically inspired Multiword Term Extractor that selects candidates for which the headword belongs to a seed list of approved single word terms. In order to achieve this without resorting to the computational complexities of a full parser, we apply a selection pipeline that leverages lightweight NLP-tools such as POS-taggers, chunkers and a self-devised head detection module.

[1]  José Gabriel Pereira Lopes,et al.  Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units , 1999, EPIA.

[2]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[3]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[4]  Rochelle Lieber,et al.  English Word-Formation Processes , 2005 .

[5]  Patrick Pantel,et al.  A Statistical Corpus-Based Term Extractor , 2001, Canadian Conference on AI.

[6]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[7]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  Hiroshi Nakagawa,et al.  A Simple but Powerful Automatic Term Extraction Method , 2002, COLING 2002.

[12]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[13]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[14]  Udo Hahn,et al.  You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction , 2006, ACL.

[15]  Udo Hahn,et al.  Finding new terminology in very large corpora , 2005, K-CAP '05.

[16]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[17]  Magnus Merkel,et al.  Using machine learning to perform automatic term recognition , 2010 .