Mining causal topics in text data: iterative topic modeling with time series feedback

Many applications require analyzing textual topics in conjunction with external time series variables such as stock prices. We develop a novel general text mining framework for discovering such causal topics from text. Our framework naturally combines any given probabilistic topic model with time-series causal analysis to discover topics that are both coherent semantically and correlated with time series data. We iteratively refine topics, increasing the correlation of discovered topics with the time series. Time series data provides feedback at each iteration by imposing prior distributions on parameters. Experimental results show that the proposed framework is effective.

[1]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[2]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[3]  Rainer Lienhart,et al.  Image retrieval on large-scale image databases , 2007, CIVR '07.

[4]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[5]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[6]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[7]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[8]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[9]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[10]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[11]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  G. Mitra,et al.  The handbook of news analytics in finance , 2011 .

[14]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[15]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[16]  ChengXiang Zhai,et al.  InCaToMi: integrative causal topic miner between textual and non-textual time series data , 2012, CIKM.

[17]  Thomas A. Rietz,et al.  Results from a Dozen Years of Election Futures Markets Research , 2008 .