Keyphrase Extraction Using Knowledge Graphs

Extracting keyphrases from documents automatically is an important and interesting task since keyphrases provide a quick summarization for documents. Although lots of efforts have been made on keyphrase extraction, most of the existing methods (the co-occurrence-based methods and the statistic-based methods) do not take semantics into full consideration. The co-occurrence-based methods heavily depend on the co-occurrence relations between two words in the input document, which may ignore many semantic relations. The statistic-based methods exploit the external text corpus to enrich the document, which introduce more unrelated relations inevitably. In this paper, we propose a novel approach to extract keyphrases using knowledge graphs, based on which we could detect the latent relations of two keyterms (i.e., noun words and named entities) without introducing many noises. Extensive experiments over real data show that our method outperforms the state-of-the-art methods including the graph-based co-occurrence methods and statistic-based clustering methods.

[1]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[2]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[3]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[4]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[5]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[6]  Zhe Chen,et al.  Long-tail Vocabulary Dictionary Extraction from the Web , 2016, WSDM.

[7]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[8]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[13]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[14]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[15]  Florian Boudin,et al.  A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction , 2013, IJCNLP.

[16]  Alex Bavelas,et al.  Communication Patterns in Task‐Oriented Groups , 1950 .

[17]  Mohamed S. Kamel,et al.  CorePhrase: Keyphrase Extraction for Document Clustering , 2005, MLDM.

[18]  Myong Kee Jeong,et al.  Class dependent feature scaling method using naive Bayes classifier for text datamining , 2009, Pattern Recognit. Lett..

[19]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[20]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[21]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[22]  Peter Norvig,et al.  Artificial intelligence - a modern approach: the intelligent agent book , 1995, Prentice Hall series in artificial intelligence.

[23]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[24]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[25]  Wei Shi,et al.  Keyphrase Extraction Using Knowledge Graphs , 2017, Data Science and Engineering.

[26]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[27]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[28]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[29]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[32]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[33]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[34]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[35]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Anette Hulth Reducing false positives by expert combination in automatic keyword indexing , 2003, RANLP.