Topic Flow Model: A Graph Theoretic Temporal Topic Model for Noisy Mediums

In the modern era, data is being created faster than ever. Social media, in particular, churns out hundreds of millions of short documents a day. It would be useful to understand the underlying topics being discussed on popular channels of social media, and how those discussions evolve over time. There exist state of the art topic models that accurately classify texts large and small, but few attempt to follow topics through time, and many are adversely a ected by the large amount of noise in social media documents. We propose Topic Flow Model (TFM), a graph theoretic temporal topic model that identi es topics as they emerge, and tracks them through time as they persist, diminish, and re-emerge. TFM identi es topic words by capturing the changing relationship strength of words over time, and o ers solutions for dealing with ood words, i.e., domain speci c words that pollute topics. We conduct an extensive empirical analysis of TFM on Twitter data, newspaper articles, and synthetic data and nd that the topic accuracy and signal to noise ratio are better than state of the art methods. Index words: Data Mining, Topic Modeling, Social Graphs, Text Mining, Natural Language Processing, Graph Mining, Temporal Topics