Topic Modeling over Text Streams from Social Media

Topic modeling becomes a popular research area which shows us new way to search, browse and summarize large amount of texts. Methods of topic modeling try to uncover the hidden thematic structure in document collections. Topic modeling in connection with social networks, which are one of the strongest communication tool and produces large amount of opinions and attitudes on world events, can be useful for analysis in case of crisis situations, elections, launching a new product on the market etc. For that reason we pro-pose a tool for topic modeling over text streams from social networks in this paper. Description of proposed tool is extended with practical experiments. Realized experiments shown promising results when using our tool on real data in comparison to state-of-the-art methods.

[1]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[4]  Martin Sarnovsky,et al.  Distributed Algorithm for Text Documents Clustering Based on k-Means Approach , 2015, ISAT.

[5]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jozef Pócs,et al.  Bipolarized extension of heterogeneous concept lattices , 2014 .

[7]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[8]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[9]  Vivek Kumar Rangarajan Sridhar,et al.  Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words , 2015, VS@HLT-NAACL.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[15]  Sinno Jialin Pan,et al.  Short and Sparse Text Topic Modeling via Self-Aggregation , 2015, IJCAI.

[16]  Jordan Boyd-Graber,et al.  Online Latent Dirichlet Allocation with Infinite Vocabulary , 2013, ICML.

[17]  Jihong Ouyang,et al.  Topic modeling for large-scale text data , 2015, Frontiers of Information Technology & Electronic Engineering.

[18]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[19]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[20]  Jozef Pócs,et al.  Basic theorem as representation of heterogeneous concept lattices , 2015, Frontiers of Computer Science.