Enhancing Short Text Topic Modeling with FastText Embeddings

Over the past few years, we have experienced the rapid development of online social media, which produced a variety of short texts. It is important to understand the topic patterns of these short texts. Because of data sparsity, traditional topic models are not suitable for short text topic analysis. In this paper, we proposed a novel topic model, referred as FastText-based Sentence-LDA (FSL) model, which extends the Sentence-LDA topic model for short texts. We first utilize the FastText model to train a word embedding replacement model, which can alleviate the problem of lacking word co-occurrence information over short texts. Secondly, we propose a new latent feature topic model which integrates latent feature word embeddings into Sentence-LDA. Experimental results demonstrate that our new model has produced significant improvements in topic coherence by using information from external corpora.