A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

In this paper, we present a novel N-gram (N> = 1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N> = 1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.

[1]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[2]  Daniel Ortiz Arroyo,et al.  Discovering Sets of Key Players in Social Networks , 2010, Computational Social Network Analysis.

[3]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[4]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[5]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[6]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[7]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[8]  Bill Broyles Notes , 1907, The Classical Review.

[9]  K. Srinathan,et al.  Exploiting N-gram Importance and Wikipedia based Additional Knowledge for Improvements in GAAC based Document Clustering , 2010, KDIR.

[10]  Chau Q. Nguyen,et al.  An Ontology-Based Approach for Key Phrase Extraction , 2009, ACL/IJCNLP.

[11]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[12]  Weiguang Qu,et al.  A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network , 2010, ACL.

[13]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[14]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[15]  S. Borgatti,et al.  The centrality of groups and classes , 1999 .

[16]  Zhuli Xie Centrality Measures in Text Mining: Prediction of Noun Phrases that Appear in Abstracts , 2005, ACL.

[17]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[18]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[19]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[20]  Sujian Li,et al.  Hypergraph-based inductive learning for generating implicit key phrases , 2011, WWW.

[21]  K. Srinathan,et al.  A Knowledge Induced Graph-Theoretical Model for Extract and Abstract Single Document Summarization , 2013, CICLing.

[22]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[23]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[24]  Ian H. Witten,et al.  Thesaurus-based index term extraction for agricultural documents , 2005 .

[25]  Kathleen M. Carley,et al.  Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers , 2004 .

[26]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[27]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.