Exploiting Description Knowledge for Keyphrase Extraction

Keyphrase extraction is essential for many IR and NLP tasks. Existing methods usually use the phrases of the document separately without distinguishing the potential semantic correlations among them, or other statistical features from knowledge bases such as WordNet and Wikipedia. However, the mutual semantic information between phrases is also important, and exploiting their correlations may potentially help us more effectively extract the keyphrases. Generally, phrases in the title are more likely to be keyphrases reflecting the document topics, and phrases in the body are usually used to describe the document topics. We regard the relation between the title phrase and body phrase as a description relation. To this end, this paper proposes a novel keyphrase extraction approach by exploiting massive description relations. To make use of the semantic information provided by the description relations, we organize the phrases of a document as a description graph, and employ various graph-based ranking algorithms to rank the candidates. Experimental results on the real dataset demonstrate the effectiveness of the proposed approach in keyphrase extraction.

[1]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[2]  Min Song,et al.  Keyphrase extraction-based query expansion in digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[3]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[4]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[5]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[6]  Loll N. Rolling Indexing consistency, quality and efficiency , 1981, Inf. Process. Manag..

[7]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.

[8]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[9]  Shibamouli Lahiri,et al.  Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks , 2014, ArXiv.

[10]  Arash Joorabchi,et al.  A citation-based approach to automatic topical indexing of scientific literature , 2010, J. Inf. Sci..

[11]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[14]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[15]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[16]  Angela Fogarolli,et al.  Word Sense Disambiguation Based on Wikipedia Link Structure , 2009, 2009 IEEE International Conference on Semantic Computing.

[17]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[18]  Olena Medelyan,et al.  Human-competitive automatic topic indexing , 2009 .

[19]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[20]  Xiaoming Zhang,et al.  Future Influence Ranking of Scientific Literature , 2014, SDM.

[21]  David N. Milne Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[22]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[23]  Wei Zhang,et al.  Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction , 2013, IJCAI.

[24]  Myeong-Kwan Kevin Cheon,et al.  Frank and I , 2012 .

[25]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[26]  Arash Joorabchi,et al.  Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms , 2013, J. Inf. Sci..

[27]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[28]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[29]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.