In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics

Topic modeling is an important tool in natural language processing. Topic models provide two forms of output. The first is a predictive model. This type of model has the ability to predict unseen documents (e.g., their categories). When topic models are used in this way, there are ample measures to assess their performance. The second output of these models is the topics themselves. Topics are lists of keywords that describe the top words pertaining to each topic. Often, these lists of keywords are presented to a human subject who then assesses the meaning of the topic, which is ultimately subjective. One of the fundamental problems of topic models lies in assessing the quality of the topics from the perspective of human interpretability. Naturally, human subjects need to be employed to evaluate interpretability of a topic. Lately, crowdsourcing approaches are widely used to serve the role of human subjects in evaluation. In this work we study measures of interpretability and propose to measure topic interpretability from two perspectives: topic coherence and topic consensus. We start with an existing measure for topic coherence—model precision. It evaluates coherence of a topic by introducing an intruded word and measuring how well a human subject or a crowdsourcing approach could identify the intruded word: if it is easy to identify, the topic is coherent. We then investigate how we can measure coherence comprehensively by examining dimensions of topic coherence. For the second perspective of topic interpretability, we suggest topic consensus that measures how well the results of a crowdsourcing approach matches those given categories of topics. Good topics should lead to good categories, thus, high topic consensus. Therefore, if there is low topic consensus in terms of categories, topics could be of low interpretability. We then further discuss how topic coherence and topic consensus assess different aspects of topic interpretability and hope that this work can pave way for comprehensive measures of topic interpretability.

[1]  Kirill Kireyev Applications of Topics Models to Analysis of Disaster-Related Twitter Data , 2009 .

[2]  Robert M. Rolfe,et al.  Exploratory analysis of highly heterogeneous document collections , 2013, KDD.

[3]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[4]  Hady Wirawan Lauw,et al.  Semantic Visualization with Neighborhood Graph Regularization , 2016, J. Artif. Intell. Res..

[5]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[6]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[7]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[8]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[9]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[10]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[11]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[12]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[13]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[14]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[15]  Chun How Tan,et al.  Beyond "local", "categories" and "friends": clustering foursquare users with latent "topics" , 2012, UbiComp.

[16]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[17]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[18]  Reza Zafarani,et al.  Whom should I follow?: identifying relevant users during crises , 2013, HT.

[19]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[20]  Fred Morstatter,et al.  Finding Eyewitness Tweets During Crises , 2014, LTCSS@ACL.

[21]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[22]  Hongfei Yan,et al.  Automatic labeling hierarchical topics , 2012, CIKM '12.

[23]  Fei Wang,et al.  ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback , 2012, AAAI.

[24]  Qiang Liu,et al.  Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy , 2014, ICML.

[25]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[26]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[27]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[30]  Daniel Barbará,et al.  Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[31]  A. Hirschman National Power and the Structure of Foreign Trade , 2024 .

[32]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[33]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[34]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[35]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[36]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[37]  Mark Stevenson,et al.  Labelling Topics using Unsupervised Graph-based Methods , 2014, ACL.

[38]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[39]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[40]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[41]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[42]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[43]  Alexei Pozdnoukhov,et al.  Space-time dynamics of topics in streaming text , 2011, LBSN '11.

[44]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[45]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[46]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[47]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[48]  Christopher D. Manning,et al.  Topic Modeling for the Social Sciences , 2009 .

[49]  H. Russell Bernard,et al.  Analyzing Qualitative Data: Systematic Approaches , 2009 .

[50]  Huan Liu,et al.  Text, Topics, and Turkers: A Consensus Measure for Statistical Topics , 2015, HT.

[51]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.