论文信息 - Automatic Keyphrase Extraction based on NLP and Statistical Methods

Automatic Keyphrase Extraction based on NLP and Statistical Methods

In this article we would like to present our exper imental approach to automatic keyphrase extraction based on statistical methods and Wordnet-based pattern evaluation. Automatic keyphrases are import ant for automatic tagging and clustering because manually assigned keyphrases are not sufficient in most cases. Keyphrase candidates are extracted in a new way derived from a combination of graph methods (TextRank) and statistical methods (TF*IDF). Keyword candidates are merged with named entities and stop words according to NL POS (Part Of a Speech) patterns. Automatic keyphrases are generated as TF*IDF weighted unigrams. Keyphrases describe the main ideas of documents in a human-readable way. Evaluation of this approac h is presented in articles extracted from News web sites. Each article contain s manually assigned topics/categories which are used for keyword evalua tion.

Karel Jezek | Martin Dostál

[1] Nick Cramer,et al. Automatic Keyword Extraction from Individual Documents , 2010 .

[2] Carl Gutwin,et al. Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[3] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[4] Gordon W. Paynter,et al. Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[5] Anette Hulth. Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction , 2004 .

[6] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.