An efficient framework for real-time tweet classification

Increasing popularity of social networking sites like facebook, twitter, google+ etc. is contributing in fast proliferation of big data. Amongst social Networking sites, twitter is one of the most common source of big data where people from across the world share their views on various topics and subjects. With daily Active user count of 100-million+ users twitter is becoming a rich information source for finding trends and current happenings around the world. Twitter does provide a limited “trends” feature. To make twitter trends more interesting and informative, in this paper we propose a framework that can analyze twitter data and classify tweets on some specific subject to generate trends. We illustrate the use of framework by analyzing the tweets on “Politics” domain as a subject. In order to classify tweets we propose a tweet classification algorithm that efficiently classify the tweets.

[1]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[2]  Karl Aberer,et al.  What have fruits to do with technology?: the case of Orange, Blackberry and Apple , 2011, WIMS '11.

[3]  John G. Breslin,et al.  Topic Classification in Social Media Using Metadata from Hyperlinked Objects , 2011, ECIR.

[4]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[5]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[6]  Mohammad Yamin,et al.  Cloud Computing in SMEs: Case of Saudi Arabia , 2015 .

[7]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[8]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[9]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[10]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[11]  Jimmy J. Lin,et al.  Smoothing techniques for adaptive online language models: topic tracking in tweet streams , 2011, KDD.

[12]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[15]  Matthew Rowe,et al.  Linked Knowledge Sources for Topic Classification of Microposts: A Semantic Graph-Based Approach , 2014, J. Web Semant..

[16]  Vaibhav Rana Innovative Use of Cloud Computing in Smart Phone Technology , 2013 .