Towards Classification of Social Streams

Social streams have become very popular in recent years because of the increasing popularity of social media sites such as Twitter, and Facebook. Such social media sites create huge streams of data, which can be leveraged for a wide variety of applications. In this paper, we will focus on the classification problem for social streams. Unfortunately, such streams are extremely noisy, and contain large volumes of information, with information about network linkages between the participants exchanging messages. This is additional social information, associated with the text stream, which can be very helpful for classification. We combine an LSH method with an incremental SVM model in order to design an effective and efficient social context-sensitive streaming classifier for this scenario. The LSH model is used for learning the social context, and the SVM model is used for more effective classification within this context. We will present experimental results, which show the effectiveness of our techniques over a wide variety of other methods.

[1]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[2]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[3]  Hongjun Lu,et al.  Classifying Text Streams in the Presence of Concept Drifts , 2004, PAKDD.

[4]  Charu C. Aggarwal,et al.  Mining text and social streams: a review , 2014, SKDD.

[5]  Gisele L. Pappa,et al.  Temporally-aware algorithms for document classification , 2010, SIGIR '10.

[6]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[7]  Maria E. Orlowska,et al.  One-Class Classification of Text Streams with Concept Drift , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[8]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[9]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[10]  Koby Crammer,et al.  A new family of online algorithms for category ranking , 2002, SIGIR '02.

[11]  Prasenjit Mitra,et al.  Event Detection and Visualization for Social Text Streams , 2007, ICWSM.

[12]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[13]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[14]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[15]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[16]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[17]  Charu C. Aggarwal,et al.  Content-centric flow mining for influence analysis in social streams , 2013, CIKM.

[18]  Suvrit Sra,et al.  Incremental Aspect Models for Mining Document Streams , 2006, PKDD.

[19]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[20]  Wei Zhang,et al.  STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[21]  Shi Zhong,et al.  Efficient streaming text clustering , 2005, Neural Networks.

[22]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[23]  Ming-Syan Chen,et al.  Incremental SVM Model for Spam Detection on Dynamic Email Social Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[24]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[25]  Hwee Tou Ng,et al.  Bayesian online classifiers for text classification and filtering , 2002, SIGIR '02.

[26]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[27]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[28]  Wai Lam,et al.  A new on-line learning algorithm for adaptive text filtering , 1998, International Conference on Information and Knowledge Management.

[29]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .