Automatic Labeling of Topic Models Using Graph-Based Ranking

Generated topic label, an alternative representation of topics learned by topic model, is widely used to help the user interpret the topics more efficiently. A major challenge now is to label a discovered topic accurately in an objective way. This article introduces a novel graph-based ranking model (TLRank), to find a meaningful topic label with high Relevance, Coverage, and Discrimination. The model applies a specific strategy that suppresses or enhances the matrix transition probability according to the textual similarity between vertices (sentences) and the characteristics of vertices respectively. Moreover, to boost diversity and enhance performance, TLRank scores the candidate sentences and refrains redundancy of topic labels simultaneously in a single labeling process. In our experiments, the evaluation results showed that the TLRank model significantly and consistently outperformed the prevailing state-of-the-art and classic models in topic labeling task.

[1]  Haitao Huang,et al.  Abstractive text summarization using LSTM-CNN based deep learning , 2018, Multimedia Tools and Applications.

[2]  Xiaojun Wan,et al.  Automatic Labeling of Topic Models Using Text Summaries , 2016, ACL.

[3]  Ming Zhou,et al.  A Redundancy-Aware Sentence Regression Framework for Extractive Summarization , 2016, COLING.

[4]  Dan Cao,et al.  Analysis of complex network methods for extractive automatic text summarization , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[5]  Juan-Zi Li,et al.  Labeling clusters from both linguistic and statistical perspectives: A hybrid approach , 2015, Knowl. Based Syst..

[6]  ChengXiang Zhai,et al.  Automatic labeling of multinomial topic models , 2007, KDD '07.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Houfeng Wang,et al.  Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[11]  Christophe Gravier,et al.  United We Stand: Using Multiple Strategies for Topic Labeling , 2018, NLDB.

[12]  Balaraman Ravindran,et al.  Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Pavan Kartheek Rachabathuni A survey on abstractive summarization techniques , 2017 .

[14]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[15]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[16]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[17]  Jiawei Han,et al.  Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents , 2014, SDM.

[18]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[19]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[20]  Stéphane Bressan,et al.  Harnessing Truth Discovery Algorithms On The Topic Labelling Problem , 2018, iiWAS.

[21]  Timothy Baldwin,et al.  Representing topics labels for exploring digital libraries , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[22]  Mark Stevenson,et al.  Representing Topics Using Images , 2013, HLT-NAACL.

[23]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[24]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[25]  Derek Miller,et al.  Leveraging BERT for Extractive Text Summarization on Lectures , 2019, ArXiv.

[26]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[27]  Ruifeng Xu,et al.  Automatic Labelling of Topic Models Learned from Twitter by Summarisation , 2014, ACL.

[28]  M. de Rijke,et al.  Sentence Relations for Extractive Summarization with Deep Neural Networks , 2018, ACM Trans. Inf. Syst..

[29]  Aditya Jain,et al.  Extractive Text Summarization Using Word Vector Embedding , 2017, 2017 International Conference on Machine Learning and Data Science (MLDS).

[30]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[31]  John D. Lafferty,et al.  Visualizing Topics with Multi-Word Expressions , 2009, 0907.1013.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[34]  Mark Stevenson,et al.  Labelling Topics using Unsupervised Graph-based Methods , 2014, ACL.

[35]  Yang Liu,et al.  Fine-tune BERT for Extractive Summarization , 2019, ArXiv.

[36]  Mehdi Allahyari,et al.  Automatic Topic Labeling Using Ontology-Based Topic Models , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[37]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors , 2015, AIRS.

[38]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[39]  Marco Antonio Sobrevilla Cabezudo,et al.  A Study of Abstractive Summarization Using Semantic Representations and Discourse Level Information , 2017, TSD.

[40]  Mark Stevenson,et al.  Re-Ranking Words to Improve Interpretability of Automatically Generated Topics , 2019, IWCS.

[41]  M. de Rijke,et al.  Leveraging Contextual Sentence Relations for Extractive Summarization Using a Neural Attention Model , 2017, SIGIR.

[42]  H. T. Le,et al.  An approach to abstractive text summarization , 2013, 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR).

[43]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[44]  Christophe Gravier,et al.  Readitopics: Make Your Topic Models Readable via Labeling and Browsing , 2018, IJCAI.

[45]  Nikolaos Aletras,et al.  Labeling Topics with Images Using a Neural Network , 2016, ECIR.

[46]  Naomie Salim,et al.  Genetic semantic graph approach for multi-document abstractive summarization , 2015, 2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC).

[47]  Timothy Baldwin,et al.  Multimodal Topic Labelling , 2017, EACL.

[48]  Timothy Baldwin,et al.  Automatic Labelling of Topics with Neural Embeddings , 2016, COLING.

[49]  Rasim Alguliyev,et al.  A sentence selection model and HLO algorithm for extractive text summarization , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).