Extracting time series variation of topic popularity in microblogs

Extracting topics and their popularities in microblogs is a promising approach to discover popular topics in the world. To challenge this task, some methods that estimate popularity of topics based on Latent Dirichlet Allocation (LDA) has been proposed. However, LDA fails to extract favorable topics on a collection of short text documents such as microblogs because the word co-occurrence information in an individual document is sparse. Therefore, in order to extract topics from microblogs, we should use a model specialized for short text documents. In this paper, we propose a topic popularity estimation method using Biterm TopicModel (BTM), which can alleviate the problem caused by document level word co-occurrence sparsity. We extract topics from the microblog documents with BTM for each time period and estimate the frequency of each topic occurrence. The proposed method can analyze the popularity of topics in a real time because we apply anefficient inference algorithm for BTMonsmall batches of tweets. Experiments on tweets collection show that some of the topics extracted by the proposed method correspond to the real world events and a topic burstiness gets higher when the event occurs.