KX: A Flexible System for Keyphrase eXtraction

In this paper we present KX, a system for key-phrase extraction developed at FBK-IRST, which exploits basic linguistic annotation combined with simple statistical measures to select a list of weighted keywords from a document. The system is flexible in that it offers to the user the possibility of setting parameters such as frequency thresholds for collocation extraction and indicators for key-phrase relevance, as well as it allows for domain adaptation exploiting a corpus of documents in an unsupervised way. KX is also easily adaptable to new languages in that it requires only a PoS-Tagger to derive lexical patterns. In the SemEval task 5 "Automatic Key-phrase Extraction from Scientific Articles", KX performance achieved satisfactory results both in finding reader-assigned keywords and in the combined keywords subtask.

[1]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[2]  Gordon W. Paynter,et al.  Interactive document summarisation using automatically extracted keyphrases , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[3]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[4]  Paolo Tonella,et al.  An empirical study on keyword-based Web site clustering , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[5]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.