A unified data representation theory for network visualization, ordering and coarse-graining

Representation of large data sets became a key question of many scientific disciplines in the last decade. Several approaches for network visualization, data ordering and coarse-graining accomplished this goal. However, there was no underlying theoretical framework linking these problems. Here we show an elegant, information theoretic data representation approach as a unified solution of network visualization, data ordering and coarse-graining. The optimal representation is the hardest to distinguish from the original data matrix, measured by the relative entropy. The representation of network nodes as probability distributions provides an efficient visualization method and, in one dimension, an ordering of network nodes and edges. Coarse-grained representations of the input network enable both efficient data compression and hierarchical visualization to achieve high quality representations of larger data sets. Our unified data representation theory will help the analysis of extensive data sets, by revealing the large-scale structure of complex networks in a comprehensible form.

[1]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[2]  J. N. Kapur,et al.  The Inverse MaxEnt and MinxEnt Principles and their Applications , 1990 .

[3]  Stephen G. Kobourov,et al.  Spring Embedders and Force Directed Graph Drawing Algorithms , 2012, ArXiv.

[4]  Michael T. Goodrich,et al.  A Fast Multi-Dimensional Algorithm for Drawing Large Graphs? , 2000 .

[5]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[6]  Michael T. Goodrich,et al.  A multi-dimensional approach to force-directed layouts of large graphs , 2000, Comput. Geom..

[7]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[8]  Naonori Ueda,et al.  Cross-Entropy Directed Embedding of Network Data , 2003, ICML.

[9]  Chris Walshaw,et al.  Multilevel Refinement for Combinatorial Optimisation Problems , 2004, Ann. Oper. Res..

[10]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[11]  Pablo A. Estévez,et al.  Cross-entropy embedding of high-dimensional data using the neural gas model , 2005, Neural Networks.

[12]  I. G. Tollis,et al.  Effective graph visualization via node grouping , 2001 .

[13]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[14]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[15]  Michael L. Honig,et al.  Wiley Series in Telecommunications and Signal Processing , 2009 .

[16]  Robert E. Tarjan,et al.  Efficient Planarity Testing , 1974, JACM.

[17]  David Harel,et al.  A fast multi-scale method for drawing large graphs , 2000, AVI '00.

[18]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[19]  F. Radicchi,et al.  Complex networks renormalization: flows and fixed points. , 2008, Physical review letters.

[20]  S. Havlin,et al.  Self-similarity of complex networks , 2005, Nature.

[21]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[24]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[25]  Patrick J. Wolfe,et al.  Network histograms and universality of blockmodel approximation , 2013, Proceedings of the National Academy of Sciences.

[26]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[27]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[28]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[29]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[30]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Robin Palotai,et al.  ModuLand plug-in for Cytoscape: determination of hierarchical layers of overlapping network modules and community centrality , 2011, Bioinform..

[33]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[34]  Robin Palotai,et al.  Community Landscapes: An Integrative Approach to Determine Overlapping Network Module Hierarchy, Identify Key Nodes and Predict Network Dynamics , 2009, PloS one.

[35]  Hernán A Makse,et al.  Small-world to fractal transition in complex networks: a renormalization group approach. , 2009, Physical review letters.

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[38]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[39]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[40]  A. Arenas,et al.  Models of social networks based on social distance attachment. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[42]  Ian P. King,et al.  An automatic reordering scheme for simultaneous equations derived from network systems , 1970 .

[43]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[44]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[45]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  David Gfeller,et al.  Spectral coarse graining of complex networks. , 2007, Physical review letters.

[47]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[48]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[49]  Yehuda Koren,et al.  Graph Drawing by Stress Majorization , 2004, GD.

[50]  Stephen C. North Graph drawing : Symposium on Graph Drawing, GD '96, Berkeley, California, USA, September 18-20, 1996, proceedings , 1997 .

[51]  Ernestina Menasalvas Ruiz,et al.  Information content: Assessing meso-scale structures in complex networks , 2014, ArXiv.

[52]  Ioannis G. Tollis,et al.  Effective graph visualization via node grouping , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[53]  Chris Walshaw,et al.  A Multilevel Approach to the Travelling Salesman Problem , 2002, Oper. Res..

[54]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[56]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[57]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[58]  D. West Introduction to Graph Theory , 1995 .

[59]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[60]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[61]  Chris Walshaw,et al.  Journal of Graph Algorithms and Applications a Multilevel Algorithm for Force-directed Graph-drawing , 2022 .

[62]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[63]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[64]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[65]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[66]  Yaneer Bar-Yam,et al.  An Information-Theoretic Formalism for Multiscale Structure in Complex Systems , 2014, 1409.4708.

[67]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.