Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction

Keyword extraction attracts much attention for its significant role in various natural language processing tasks. While some existing methods for keyword extraction have considered using single type of semantic relatedness between words or inherent attributes of words, almost all of them ignore two important issues: 1) how to fuse multiple types of semantic relations between words into a uniform semantic measurement and automatically learn the weights of the edges between the words in the word graph of each document, and 2) how to integrate the relations between words and words' intrinsic features into a unified model. In this work, we tackle the two issues based on the supervised random walk model. We propose a supervised ranking based method for keyword extraction, which is called SEAFARER1. It can not only automatically learn the weights of the edges in the unified graph of each document which includes multiple semantic relations but also combine the merits of semantic relations of edges and intrinsic attributes of nodes together. We conducted extensive experimental study on an established benchmark and the experimental results demonstrate that SEAFARER outperforms the state-of-the-art supervised and unsupervised methods.

[1]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[2]  Jiawei Han,et al.  Keyword extraction for social snippets , 2010, WWW '10.

[3]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[4]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[5]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[6]  Jianyong Wang,et al.  Incorporating heterogeneous information for personalized tag recommendation in social tagging systems , 2012, KDD.

[7]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Katja Hofmann,et al.  The impact of document structure on keyphrase extraction , 2009, CIKM.

[10]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[11]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[12]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Songhua Xu,et al.  Keyword Extraction and Headline Generation Using Novel Word Features , 2010, Proceedings of the AAAI Conference on Artificial Intelligence.

[15]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[16]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[17]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[18]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[19]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.