Scalable learning of collective behavior based on sparse social dimensions

The study of collective behavior is to understand how individuals behave in a social network environment. Oceans of data generated by social media like Facebook, Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension based approach is adopted to address the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands or even millions of actors. The scale of networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the social-dimension based approach can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other non-scalable methods.

[1]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[2]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[3]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[4]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[5]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[8]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[9]  M. Abrahamson,et al.  Principles of Group Solidarity. , 1988 .

[10]  Ruoming Jin,et al.  Fast and exact out-of-core and distributed k-means clustering , 2006, Knowledge and Information Systems.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Judith S. Donath,et al.  Homophily in online dating: when do you like someone like yourself? , 2005, CHI Extended Abstracts.

[13]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[14]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[15]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[16]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[17]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[18]  Armando Geller Behavioral Modeling and Simulation: From Individuals to Societies by Greg L. Zacharias, Jean Macmillan and Susan B. Van Hemel , 2009, J. Artif. Soc. Soc. Simul..

[19]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[20]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[21]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[22]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[23]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[24]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .