MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

Latent Dirichlet AllocationLDA is an efficient method of text mining,but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model MR-LDA, which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.

[1]  Jay M Bernhardt,et al.  Detecting themes of public concern: a text mining analysis of the Centers for Disease Control and Prevention's Ebola live Twitter chat. , 2015, American journal of infection control.

[2]  Tomohiro Yoshikawa,et al.  Twitter-TTM: An efficient online topic modeling for Twitter considering dynamics of user interests and topic trends , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[3]  Naixue Xiong,et al.  Cold-Start Recommendation Using Bi-Clustering and Fusion for Large-Scale Social Recommender Systems , 2014, IEEE Transactions on Emerging Topics in Computing.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[6]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[7]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[8]  Eleonora D'Andrea,et al.  Real-Time Detection of Traffic From Twitter Stream Analysis , 2015, IEEE Transactions on Intelligent Transportation Systems.

[9]  Yingyuan Xiao,et al.  Time-ordered collaborative filtering for news recommendation , 2015 .

[10]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[11]  Zhang Chenyi,et al.  Topic Mining for Microblog Based on MB-LDA Model , 2011 .

[12]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[13]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[14]  Paola Velardi,et al.  Efficient temporal mining of micro-blog texts and its application to event discovery , 2015, Data Mining and Knowledge Discovery.

[15]  Mihhail Matskin,et al.  OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[16]  Ching-Hsien Hsu,et al.  Using reputation measurement to defend mobile social networks against malicious feedback ratings , 2015, The Journal of Supercomputing.

[17]  Tomoaki Ohtsuki,et al.  Opinion mining in Twitter: How to make use of sarcasm to enhance sentiment analysis , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[18]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[19]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[20]  Lan Mu,et al.  Effect of climate and seasonality on depressed mood among twitter users , 2015 .

[21]  Ching-Hsien Hsu,et al.  Collaboration reputation for trustworthy Web service selection in social networks , 2016, J. Comput. Syst. Sci..