In this article we would like to present our exper imental approach to automatic keyphrase extraction based on statistical methods and Wordnet-based pattern evaluation. Automatic keyphrases are import ant for automatic tagging and clustering because manually assigned keyphrases are not sufficient in most cases. Keyphrase candidates are extracted in a new way derived from a combination of graph methods (TextRank) and statistical methods (TF*IDF). Keyword candidates are merged with named entities and stop words according to NL POS (Part Of a Speech) patterns. Automatic keyphrases are generated as TF*IDF weighted unigrams. Keyphrases describe the main ideas of documents in a human-readable way. Evaluation of this approac h is presented in articles extracted from News web sites. Each article contain s manually assigned topics/categories which are used for keyword evalua tion.
[1]
Nick Cramer,et al.
Automatic Keyword Extraction from Individual Documents
,
2010
.
[2]
Carl Gutwin,et al.
Improving browsing in digital libraries with keyphrase indexes
,
1999,
Decis. Support Syst..
[3]
Rada Mihalcea,et al.
TextRank: Bringing Order into Text
,
2004,
EMNLP.
[4]
Gordon W. Paynter,et al.
Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications
,
2002,
J. Assoc. Inf. Sci. Technol..
[5]
Anette Hulth.
Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction
,
2004
.
[6]
Rajeev Motwani,et al.
The PageRank Citation Ranking : Bringing Order to the Web
,
1999,
WWW 1999.