Leveraging social media networks for classification

Social media has reshaped the way in which people interact with each other. The rapid development of participatory web and social networking sites like YouTube, Twitter, and Facebook, also brings about many data mining opportunities and novel challenges. In particular, we focus on classification tasks with user interaction information in a social network. Networks in social media are heterogeneous, consisting of various relations. Since the relation-type information may not be available in social media, most existing approaches treat these inhomogeneous connections homogeneously, leading to an unsatisfactory classification performance. In order to handle the network heterogeneity, we propose the concept of social dimension to represent actors’ latent affiliations, and develop a classification framework based on that. The proposed framework, SocioDim, first extracts social dimensions based on the network structure to accurately capture prominent interaction patterns between actors, then learns a discriminative classifier to select relevant social dimensions. SocioDim, by differentiating different types of network connections, outperforms existing representative methods of classification in social media, and offers a simple yet effective approach to integrating two types of seemingly orthogonal information: the network of actors and their attributes.

[1]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[4]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[5]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[6]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[7]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[8]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[9]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[10]  Mike Thelwall,et al.  Homophily in MySpace , 2009, J. Assoc. Inf. Sci. Technol..

[11]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[12]  Huan Liu,et al.  Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[13]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[14]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[15]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[16]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[17]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[18]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[21]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[22]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[23]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[24]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.

[25]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[26]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[27]  B. Wellman The School Child’s Choice of Companions , 1926 .

[28]  Mahdi Shafiei,et al.  Mixed-Membership Stochastic Block-Models for Transactional Data , 2009 .

[29]  Edoardo M. Airoldi,et al.  Stochastic Block Models of Mixed Membership , 2006 .

[30]  Charles Elkan,et al.  Predicting labels for dyadic data , 2010, Data Mining and Knowledge Discovery.

[31]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[32]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[33]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[34]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[35]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[38]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[39]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[40]  Saso Dzeroski,et al.  Proceedings of the 4th international workshop on Multi-relational mining , 2005 .

[41]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[42]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[43]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[44]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[46]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[47]  C. Lee Giles,et al.  Advances in Social Network Mining and Analysis, Second International Workshop, SNAKDD 2008, Las Vegas, NV, USA, August 24-27, 2008, Revised Selected Papers , 2010, SNAKDD.

[48]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[49]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[51]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[52]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[53]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[54]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[55]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[56]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[57]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[58]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[59]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[60]  Judith S. Donath,et al.  Homophily in online dating: when do you like someone like yourself? , 2005, CHI Extended Abstracts.

[61]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.