论文信息 - Improving Keyphrase Extraction Using Wikipedia Semantics

Improving Keyphrase Extraction Using Wikipedia Semantics

Keyphrase extraction plays a key role in various fields such as information retrieval, text classification etc. However, most traditional keyphrase extraction methods relies on word frequency and position instead of document inherent semantic information, often results in inaccurate output. In this paper, we propose a novel automatic keyphrase extraction algorithm using semantic features mined from online Wikipedia. This algorithm first identifies candidate keyphrases based on lexical methods, and then a semantic graph which connects candidate keyphrases with document topics is constructed. Afterwards, a link analysis algorithm is applied to assign semantic feature weight to the candidate keyphrases. Finally, several statistical and semantic features are assembled by a regression model to predict the quality of candidates. Encouraging results are achieved in our experiments which show the effectiveness of our method.

Minglu Li | Junqi Hou | Shidou Jiao | Tianyi Shi

[1] Peter D. Turney. Learning to Extract Keyphrases from Text , 2002, ArXiv.

[2] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[3] Simone Paolo Ponzetto,et al. Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[4] Carl Gutwin,et al. KEA: practical automatic keyphrase extraction , 1999, DL '99.

[5] Carl Gutwin,et al. Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[6] Carl Gutwin,et al. Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[7] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[8] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[9] Joshua Goodman,et al. Finding advertising keywords on web pages , 2006, WWW '06.

[10] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[11] Evgeniy Gabrilovich,et al. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.