Provably Fast Inference of Latent Features from Networks: with Applications to Learning Social Circles and Multilabel Classification

A well known phenomenon in social networks is homophily, the tendency of agents to connect with similar agents. A derivative of this phenomenon is the emergence of communities. Another phenomenon observed in numerous networks is the existence of certain agents that belong simultaneously to multiple communities. An understanding of these phenomena constitutes a central research topic of network science. In this work we focus on a fundamental theoretical question related to the above phenomena with various applications: given an undirected graph G, can we infer efficiently the latent vertex features which explain the observed network structure under the assumption of a generative model that exhibits homophily? We propose a probabilistic generative model with the property that the probability of an edge among two vertices is a non-decreasing function of the common features they possess. This property is true for many real-world networks and surprisingly is ignored by many popular overlapping community detection methods as it was shown recently by the empirical work of Yang and Leskovec [44]. Our main theoretical contribution is the first provably rapidly mixing Markov chain for inferring latent features. On the experimental side, we verify the efficiency of our method in terms of run times, where we observe that it significantly outperforms state-of-the-art methods. Our method is more than 2,400 times faster than a state-of-the-art machine learning method [37] and typically provides non-trivial speedups compared to BigClam [43]. Furthermore, we verify on real-data with ground-truth available that our method learns efficiently high quality labelings. We use our method to learn social circles from Twitter ego-networks and perform multilabel classification.

[1]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[2]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[3]  Jure Leskovec,et al.  Overlapping Communities Explain Core–Periphery Organization of Networks , 2014, Proceedings of the IEEE.

[4]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  V. Climenhaga Markov chains and mixing times , 2013 .

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[9]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[10]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[11]  Vahab S. Mirrokni,et al.  On the Advantage of Overlapping Clusters for Minimizing Conductance , 2014, Algorithmica.

[12]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[13]  Ben Morris,et al.  Mixing time of the card-cyclic-to-random shuffle , 2012 .

[14]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[15]  Robert E. Tarjan,et al.  Decomposition by clique separators , 1985, Discret. Math..

[16]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[17]  Zoubin Ghahramani,et al.  Correlated Non-Parametric Latent Feature Models , 2009, UAI.

[18]  Yong Wang,et al.  Overlapping Community Detection in Complex Networks using Symmetric Binary Matrix Factorization , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Alain Guénoche,et al.  Multifunctional proteins revealed by overlapping clustering in protein interaction network , 2011, Bioinform..

[20]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[22]  Aristides Gionis,et al.  Overlapping correlation clustering , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  D. Mumford Pattern theory: a unifying perspective , 1996 .

[24]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[25]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[27]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[28]  Flávio Keidi Miyazawa,et al.  Evolutionary algorithms for overlapping correlation clustering , 2014, GECCO.

[29]  Martin E. Dyer,et al.  Path coupling: A technique for proving rapid mixing in Markov chains , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[30]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[31]  Charalampos E. Tsourakakis Toward Quantifying Vertex Similarity in Networks , 2011, Internet Math..

[32]  A. Folkesson IT and society , 2013 .

[33]  Silvio Lattanzi,et al.  The Power of Random Neighbors in Social Networks , 2015, WSDM.

[34]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[35]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[36]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[37]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[38]  Zoubin Ghahramani,et al.  Accelerated sampling for the Indian Buffet Process , 2009, ICML '09.

[39]  D. Aldous Exchangeability and related topics , 1985 .

[40]  Mark Braverman,et al.  Finding Endogenously Formed Communities , 2012, SODA.

[41]  Zoubin Ghahramani,et al.  An Infinite Latent Attribute Model for Network Data , 2012, ICML.

[42]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[43]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[44]  Vahab Mirrokni,et al.  Overlapping clusters for distributed computation , 2012, WSDM '12.

[45]  Sanjeev Arora,et al.  Finding overlapping communities in social networks: toward a rigorous approach , 2011, EC '12.