Data Dimension Reduction and Network Sparsification Based on Minimal Algorithmic Information Loss

We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic information theory designed to minimize the loss of algorithmic information. The method coarse-grains data in an algorithmic fashion by collapsing regions that can be procedurally regenerated from the compressed version. We show that the method can preserve the salient properties of objects and structures in the process of data dimension reduction and denoising. Using suboptimal approximations of efficient (polynomial) estimations to algorithmic complexity by recent numerical methods of algorithmic probability we demonstrate how these algorithms can preserve structure properties, outperforming other algorithms in e.g. the area of network dimension reduction. As a case study, we report that the method preserves all the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and some of the leading network sparsification methods.

[1]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[2]  Hector Zenil,et al.  Algorithmic Data Analytics, Small Data Matters and Correlation versus Causation , 2013, 1309.1418.

[3]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[4]  Hector Zenil,et al.  Low Algorithmic Complexity Entropy-deceiving Graphs , 2016, Physical review. E.

[5]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[6]  Gregory J. Chaitin,et al.  Algorithmic Information Theory , 1987, IBM J. Res. Dev..

[7]  Shang-Hua Teng,et al.  Spectral sparsification of graphs: theory and algorithms , 2013, CACM.

[8]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[9]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[10]  Denis R. Hirschfeldt,et al.  Algorithmic randomness and complexity. Theory and Applications of Computability , 2012 .

[11]  KoutraDanai,et al.  Graph Summarization Methods and Applications , 2018 .

[12]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[13]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[14]  Ronald L. Graham,et al.  On the History of the Minimum Spanning Tree Problem , 1985, Annals of the History of Computing.

[15]  Christopher G. Langton,et al.  Studying artificial life with cellular automata , 1986 .

[16]  Hector Zenil,et al.  An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems , 2017, bioRxiv.

[17]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[18]  Jean-Paul Delahaye,et al.  Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures , 2012, Comput..

[19]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[20]  Bolian Liu,et al.  Graphs determined by their (signless) Laplacian spectra , 2011 .

[21]  Hector Zenil,et al.  Quantifying loss of information in network-based dimensionality reduction techniques , 2015, J. Complex Networks.

[22]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[23]  Hector Zenil,et al.  A Review of Graph and Network Complexity from an Algorithmic Information Perspective , 2018, Entropy.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[26]  Hector Zenil,et al.  Correlation of automorphism group size and topological properties with program−size complexity evaluations of graphs and complex networks , 2013, 1306.0322.

[27]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[28]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[30]  Jean-Paul Delahaye,et al.  Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness , 2011, Appl. Math. Comput..

[31]  T. Rado On non-computable functions , 1962 .

[32]  Hector Zenil,et al.  A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity , 2016, Entropy.

[33]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[34]  Jean-Paul Delahaye,et al.  Two-Dimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility , 2012, ArXiv.

[35]  Felipe S. Abrahão The "paradox" of computability and a recursive relative version of the Busy Beaver function , 2016, ArXiv.

[36]  David R. Karger,et al.  Approximating s – t Minimum Cuts in ~ O(n 2 ) Time , 2007 .

[37]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[38]  Hector Zenil,et al.  Undecidability and Irreducibility Conditions for Open-Ended Evolution and Emergence , 2016, Artificial Life.

[39]  Hector Zenil,et al.  Coding-theorem like behaviour and emergence of the universal distribution from resource-bounded algorithmic probability , 2017, Int. J. Parallel Emergent Distributed Syst..

[40]  Yuval Shavitt,et al.  A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[41]  Paul Chew,et al.  There are Planar Graphs Almost as Good as the Complete Graph , 1989, J. Comput. Syst. Sci..

[42]  Hector Zenil,et al.  On incompressible high order networks , 2018, ArXiv.

[43]  Cristian S. Calude Information and Randomness: An Algorithmic Perspective , 1994 .

[44]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[45]  Hector Zenil,et al.  Methods of information theory and algorithmic complexity for network biology. , 2014, Seminars in cell & developmental biology.

[46]  Sanjeev Arora,et al.  Computational Complexity: A Modern Approach , 2009 .

[47]  Hector Zenil,et al.  The Thermodynamics of Network Coding, and an Algorithmic Refinement of the Principle of Maximum Entropy † , 2018, Entropy.