The impossibility of low-rank representations for triangle-rich complex networks

Significance Our main message is that the popular method of low-dimensional embeddings provably cannot capture important properties of real-world complex networks. A widely used algorithmic technique for modeling these networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to common wisdom, we argue that such graph embeddings do not capture salient properties of complex networks. We mathematically prove that low-dimensional embeddings cannot generate graphs with both low average degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. This establishes that popular low-dimensional embedding methods fail to capture significant structural aspects of real-world complex networks. The study of complex networks is a significant development in modern science, and has enriched the social sciences, biology, physics, and computer science. Models and algorithms for such networks are pervasive in our society, and impact human behavior via social networks, search engines, and recommender systems, to name a few. A widely used algorithmic technique for modeling such complex networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to the common view, we argue that such graph embeddings do not capture salient properties of complex networks. The two properties we focus on are low degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. We mathematically prove that any embedding (that uses dot products to measure similarity) that can successfully create these two properties must have a rank that is nearly linear in the number of vertices. Among other implications, this establishes that popular embedding techniques such as singular value decomposition and node2vec fail to capture significant structural aspects of real-world complex networks. Furthermore, we empirically study a number of different embedding techniques based on dot product, and show that they all fail to capture the triangle structure.

[1]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[2]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[3]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[4]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[5]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[6]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[7]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[8]  Jon M. Kleinberg,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World [Book Review] , 2013, IEEE Technol. Soc. Mag..

[9]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[10]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[11]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[12]  Edward R. Scheinerman,et al.  Random Dot Product Graph Models for Social Networks , 2007, WAW.

[13]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[14]  Tamara G. Kolda,et al.  Degree relations of triangles in real-world networks and graph models , 2012, CIKM.

[15]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Carey E. Priebe,et al.  Statistical Inference on Random Dot Product Graphs: a Survey , 2017, J. Mach. Learn. Res..

[18]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[19]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Jon M. Kleinberg,et al.  Block models and personalized PageRank , 2016, Proceedings of the National Academy of Sciences.

[21]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[22]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[23]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[24]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[25]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[26]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[27]  Tamara G. Kolda,et al.  Degree Relations of Triangles in Real-world Networks and Models , 2012, arXiv.org.

[28]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[29]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.