Scalable Learning of Collective Behavior

This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other nonscalable methods.

[1]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[2]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[3]  M. Hazewinkel Encyclopaedia of mathematics , 1987 .

[4]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[5]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[7]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[9]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Jennifer Neville,et al.  Leveraging relational autocorrelation with latent group models , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[13]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[14]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[15]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[16]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[17]  Alexandros Ntoulas,et al.  Homophily in the Digital World: A LiveJournal Case Study , 2010, IEEE Internet Computing.

[18]  Huan Liu,et al.  Toward Predicting Collective Behavior via Social Dimension Extraction , 2010, IEEE Intelligent Systems.

[19]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[20]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[21]  Judith S. Donath,et al.  Homophily in online dating: when do you like someone like yourself? , 2005, CHI Extended Abstracts.

[22]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[23]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[24]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[26]  Volker Tresp,et al.  Soft Clustering on Graphs , 2005, NIPS.

[27]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Ruoming Jin,et al.  Fast and exact out-of-core and distributed k-means clustering , 2006, Knowledge and Information Systems.

[29]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[30]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[31]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[32]  Steve Gregory,et al.  An Algorithm to Find Overlapping Community Structure in Networks , 2007, PKDD.

[33]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[34]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[35]  R. Z. Norman,et al.  Some properties of line digraphs , 1960 .

[36]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[37]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[38]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[39]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[40]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[41]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[42]  R. Carter 11 – IT and society , 1991 .

[43]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.