Revealing the predictability of intrinsic structure in complex networks

Structure prediction is an important and widely studied problem in network science and machine learning, finding its applications in various fields. Despite the significant progress in prediction algorithms, the fundamental predictability of structures remains unclear, as networks’ complex underlying formation dynamics are usually unobserved or difficult to describe. As such, there has been a lack of theoretical guidance on the practical development of algorithms for their absolute performances. Here, for the first time, we find that the normalized shortest compression length of a network structure can directly assess the structure predictability. Specifically, shorter binary string length from compression leads to higher structure predictability. We also analytically derive the origin of this linear relationship in artificial random networks. In addition, our finding leads to analytical results quantifying maximum prediction accuracy, and allows the estimation of the network dataset potential values through the size of the compressed network data file. The likelihood of linking within a complex network is of importance to solve real-world problems, but it is challenging to predict. Sun et al. show that the link predictability limit can be well estimated by measuring the shortest compression length of a network without a need of prediction algorithm.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[3]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[4]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[6]  A. Barabasi,et al.  Network link prediction by global silencing of indirect correlations , 2013, Nature Biotechnology.

[7]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[8]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .

[9]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  John Riedl,et al.  E-Commerce Recommendation Applications , 2004, Data Mining and Knowledge Discovery.

[11]  Wei Tang,et al.  Supervised Link Prediction Using Multiple Sources , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Albert-László Barabási,et al.  Network-based prediction of drug combinations , 2019, Nature Communications.

[13]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[14]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[15]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[16]  Cecilia Mascolo,et al.  Exploiting place features in link prediction on location-based social networks , 2011, KDD.

[17]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[18]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[19]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[20]  Linyuan Lü,et al.  Toward link predictability of complex networks , 2015, Proceedings of the National Academy of Sciences.

[21]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[22]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[23]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[24]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[25]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[28]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[29]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[30]  S. O. Aase,et al.  IMPROVED HUFFMAN CODING USING RECURSIVE SPLITTING , 2000 .

[31]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[32]  Dino Pedreschi,et al.  Human mobility, social ties, and link prediction , 2011, KDD.

[33]  Jean-Jacques E. Slotine,et al.  The missing link , 2012, Nature Physics.

[34]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[35]  Benny Sudakov,et al.  On the asymmetry of random regular graphs and random graphs , 2002, Random Struct. Algorithms.

[36]  S. Stenholm Information, Physics and Computation, by Marc Mézard and Andrea Montanari , 2010 .

[37]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[38]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[39]  Wojciech Szpankowski,et al.  Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments , 2012, IEEE Transactions on Information Theory.

[40]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.