Blog topic analysis using TF smoothing and LDA

In the era of Web 2.0, the number of blogs has explosively increased. With the appearance of social network services, blogs has become the places for sharing professional knowledge and personal branding. So, in order to understand the trends of topics or to analyze the content of blogs, the time sensitive topic extraction and topic change analysis is important and necessary. In the previous studies, most of topic extraction models extracted topic words independently from each time slice and tried to combine those. However, these methods did not show a good performance in analyzing topic trends because the topics extracted from time slices are independent. To cope with this problem, we propose a term frequency smoothing method which weaves time slices so that the more related topics are extracted from each time slice and a better topic trend analysis is generated. In order to extract topics from smoothed term frequencies, LDA, a generative topic model, is adopted. The evaluation of the proposed method on IT blogs shows that it can effectively discover quite meaningful topic patterns and topic words.