Community Discovery via Metagraph Factorization

This work aims at discovering community structure in rich media social networks through analysis of time-varying, multirelational data. Community structure represents the latent social context of user actions. It has important applications such as search and recommendation. The problem is particularly useful in the enterprise domain, where extracting emergent community structure on enterprise social media can help in forming new collaborative teams, in expertise discovery, and in the long term reorganization of enterprises based on collaboration patterns. There are several unique challenges: (a) In social media, the context of user actions is constantly changing and coevolving; hence the social context contains time-evolving multidimensional relations. (b) The social context is determined by the available system features and is unique in each social media platform; hence the analysis of such data needs to flexibly incorporate various system features. In this article we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from dynamic, multidimensional social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multirelational and multidimensional social data; (2) an efficient multirelational factorization method for community extraction on a given metagraph; (3) an online method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from an enterprise and the public Digg social media Web site suggest that our technique is scalable and is able to extract meaningful communities from social media contexts. We illustrate the usefulness of our framework through two prediction tasks: (1) in the enterprise dataset, the task is to predict users’ future interests on tag usage, and (2) in the Digg dataset, the task is to predict users’ future interests in voting and commenting on Digg stories. Our prediction significantly outperforms baseline methods (including aspect model and tensor analysis), indicating the promising direction of using metagraphs for handling time-varying social relational contexts.

[1]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[2]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[3]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[4]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[7]  Lawrence B. Holder,et al.  Learning patterns in the dynamics of biological networks , 2009, KDD.

[8]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[9]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[10]  M. Barber,et al.  Searching for Communities in Bipartite Networks , 2008, 0803.2854.

[11]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[12]  Robert D. Rugg Part 4: Mathematical, Algorithmic and Data Structure Issues: Building a Hypergraph-Based Data Structure The Examples of Census Geography and the Road System , 1984 .

[13]  Robert W. Blanning,et al.  Metagraphs and Their Applications (Integrated Series in Information Systems) , 2006 .

[14]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[15]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[16]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[17]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[18]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[19]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[20]  Marija Mitrovic,et al.  Mixing patterns and communities on bipartite graphs on web-based social interactions , 2009, 2009 16th International Conference on Digital Signal Processing.

[21]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[22]  R. Plemmons,et al.  On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices , 2004 .

[23]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[24]  Tamara G. Kolda,et al.  Temporal Analysis of Social Networks using Three-way DEDICOM , 2006 .

[25]  David R. Millen,et al.  Dogear: Social bookmarking in the enterprise , 2006, CHI.

[26]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[27]  Rob Cross,et al.  A Relational View of Information Seeking and Learning in Social Networks , 2003, Manag. Sci..

[28]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[29]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[30]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[31]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[32]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[35]  Philip S. Yu,et al.  Online Analysis of Community Evolution in Data Streams , 2005, SDM.

[36]  Zeki Simsek,et al.  The Electronic Survey Technique: An Integration and Assessment , 2000 .

[37]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[38]  A. Bonato,et al.  Graphs and Hypergraphs , 2022 .

[39]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[40]  W. Powell,et al.  Interorganizational Collaboration and the Locus of Innovation: Networks of Learning in Biotechnology. , 1996 .

[41]  Tanya Y. Berger-Wolf,et al.  Mining Periodic Behavior in Dynamic Social Networks , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[42]  Yi Zhang,et al.  Probabilistic polyadic factorization and its application to personalized recommendation , 2008, CIKM '08.

[43]  Mark S. Granovetter Economic Action and Social Structure: The Problem of Embeddedness , 1985, American Journal of Sociology.

[44]  Stephen B. Seidman,et al.  Structures induced by collections of subsets: a hypergraph approach , 1981, Math. Soc. Sci..

[45]  Volker Tresp,et al.  Soft Clustering on Graphs , 2005, NIPS.

[46]  T. Scandura,et al.  Research Methodology In Management: Current Practices, Trends, And Implications For Future Research , 2000 .

[47]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[48]  Masashi Furukawa,et al.   A Method for Solving a Bipartite-Graph Clustering Problem with Sequence Optimization , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[49]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[50]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[51]  David M. Pennock,et al.  Generative Models for Cold-Start Recommendations , 2001 .

[52]  Myra Spiliopoulou,et al.  Mining and Visualizing the Evolution of Subgroups in Social Networks , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[53]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[55]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[57]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[58]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[59]  Robert W. Blanning,et al.  Metagraphs and their applications , 2007, Integrated series in information systems.

[60]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[61]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[62]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[63]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[64]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[65]  Yun Chi,et al.  Facetnet: a framework for analyzing communities and their evolutions in dynamic networks , 2008, WWW.

[66]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[67]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[68]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[69]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[70]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[71]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[72]  E. Lister The New Handbook of Organizational Communication: Advances in Theory, Research, and Methods , 2002 .

[73]  Jeffrey M. Stanton,et al.  Using Internet/Intranet Web Pages to Collect Organizational Research Data , 2001 .

[74]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[75]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[76]  Yihong Gong,et al.  A Bayesian Approach Toward Finding Communities and Their Evolutions in Dynamic Social Networks , 2009, SDM.

[77]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[78]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[79]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[80]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2009, ACM Trans. Knowl. Discov. Data.

[81]  LinYu-Ru,et al.  Community Discovery via Metagraph Factorization , 2011 .

[82]  Peter R. Monge,et al.  Emergence of Communication Networks , 1999 .

[83]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.