论文信息 - Algorithmic Probability-guided Supervised Machine Learning on Non-differentiable Spaces

Algorithmic Probability-guided Supervised Machine Learning on Non-differentiable Spaces

We show how complexity theory can be introduced in machine learning to help bring together apparently disparate areas of current research. We show that this new approach requires less training data and is more generalizable as it shows greater resilience to random attacks. We investigate the shape of the discrete algorithmic space when performing regression or classification using a loss function parametrized by algorithmic complexity, demonstrating that the property of differentiation is not necessary to achieve results similar to those obtained using differentiable programming approaches such as deep learning. In doing so we use examples which enable the two approaches to be compared (small, given the computational power required for estimations of algorithmic complexity). We find and report that (i) machine learning can successfully be performed on a non-smooth surface using algorithmic complexity; (ii) that parameter solutions can be found using an algorithmic-probability classifier, establishing a bridge between a fundamentally discrete theory of computability and a fundamentally continuous mathematical theory of optimization methods; (iii) a formulation of an algorithmically directed search technique in non-smooth manifolds can be defined and conducted; (iv) exploitation techniques and numerical methods for algorithmic search to navigate these discrete non-differentiable spaces can be performed; in application of the (a) identification of generative rules from data observations; (b) solutions to image classification problems more resilient against pixel attacks compared to neural networks; (c) identification of equation parameters from a small data-set in the presence of noise in continuous ODE system problem, (d) classification of Boolean NK networks by (1) network topology, (2) underlying Boolean function, and (3) number of incoming edges.

[1] Jean-Paul Delahaye,et al. On the Algorithmic Nature of the World , 2009, ArXiv.

[2] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Ray J. Solomonoff,et al. The Kolmogorov Lecture* The Universal Distribution and Machine Learning , 2003, Comput. J..

[5] B. Sinha,et al. Statistical Meta-Analysis with Applications , 2008 .

[6] Hector Zenil. Compression is Comprehension, and the Unreasonable Effectiveness of Digital Computation in the Natural World , 2019, ArXiv.

[7] Gregory J. Chaitin,et al. Algorithmic Information Theory , 1987, IBM J. Res. Dev..

[8] R. Solomonoff. A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .

[9] Hector Zenil,et al. Causal deconvolution by algorithmic generative models , 2019, Nature Machine Intelligence.

[10] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[12] Lance Fortnow,et al. Sophistication Revisited , 2003, Theory of Computing Systems.

[13] Hector Zenil,et al. Une approche expérimentale à la théorie algorithmique de la complexité , 2011 .

[14] Ming Li,et al. Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[15] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[16] Maxim Teslenko,et al. Kauffman networks: analysis and applications , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[17] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] S. Kauffman. Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[20] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[21] Jeffrey Shallit,et al. Proving Darwin: Making Biology Mathematical , 2012 .

[22] Hector Zenil,et al. Algorithmically probable mutations reproduce aspects of evolution, such as convergence rate, genetic memory and modularity , 2017, Royal Society Open Science.

[23] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[24] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[25] F. Fogelman-Soulié,et al. Random Boolean Networks , 1981 .

[26] Jean-Paul Delahaye,et al. Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness , 2011, Appl. Math. Comput..

[27] Hector Zenil,et al. Coding-theorem like behaviour and emergence of the universal distribution from resource-bounded algorithmic probability , 2017, Int. J. Parallel Emergent Distributed Syst..

[28] Gregory J. Chaitin. Evolution of Mutating Software , 2009, Bull. EATCS.