Avaliação de Abordagens Probabilísticas de Extração de Tópicos em Documentos Curtos

Short texts are very popular in social media. Comments and reviews are examples of common short texts found in the Web. Topics extraction from text is a challenging task for content analysis. Lately, probabilistic topic modelling has been used as a tool for topic extraction. To extract topics from short documents is more challenging since the word co-occurrence is more sparse. The aim of this work is, thus, evaluate some short documents topic modelling to identify which one is more suitable in the scenarios proposed. We conduct experiments on three short text collections, and results show that the approaches have similar performances.