A topological collapse for document summarization

As a useful tool to summarize documents, keyphrase extraction extracts a set of single or multiple words, called keyphrases, that capture the primary topics discussed in a document. In this paper we propose DoCollapse, a topological collapse-based unsupervised keyphrase extraction method that relies on networking document by semantic relatedness of candidate keyphrases. A semantic graph is built with candidates keyphrases as vertices and then reduced to its core using topological collapse algorithm to facilitate final keyphrase selection. Iteratively collapsing dominated vertices aids in removing noisy candidates and revealing important points. We conducted experiments on two standard evaluation datasets composed of scientific papers and found that DoCollapse outperforms state-of-the-art methods. Results show that simplifying a document graph by homology-preserving topological collapse benefits keyphrase extraction.

[1]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[4]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[5]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[6]  Xiaojin Zhu,et al.  Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing , 2013, IJCAI.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Ananthram Swami,et al.  Simplifying the homology of networks via strong collapses , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[10]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[11]  Florian Boudin,et al.  TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction , 2013, IJCNLP.

[12]  I-Jen Chiang,et al.  Discover the semantic topology in high-dimensional data , 2007, Expert Syst. Appl..

[13]  Pawel Dlotko,et al.  Computational Topology in Text Mining , 2012, CTIC.

[14]  Hamid Krim,et al.  Node Dominance: Revealing Community and Core-Periphery Structure in Social Networks , 2015, IEEE Transactions on Signal and Information Processing over Networks.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Bahareh Rahmanzadeh Heravi,et al.  Topic Detection in Twitter Using Topology Data Analysis , 2015, ICWE Workshops.