Deep Learning Works in Practice. But Does it Work in Theory?

Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs. We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that $ P \neq NC$, explains the success of deep learning.

[1]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2]  H. Eysenck THINKING , 1958 .

[3]  E. Wigner The Unreasonable Effectiveness of Mathematics in the Natural Sciences (reprint) , 1960 .

[4]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[5]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[6]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[7]  A. Yao Separating the polynomial-time hierarchy by oracles , 1985 .

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[10]  W. J. Freeman,et al.  Alan Turing: The Chemical Basis of Morphogenesis , 1986 .

[11]  Charles H. Bennett Logical depth and physical complexity , 1988 .

[12]  A. M. Turing,et al.  The chemical basis of morphogenesis , 1952, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences.

[13]  A. Shiryayev On Tables of Random Numbers , 1993 .

[14]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[15]  Mark Braverman,et al.  Poly-logarithmic independence fools bounded-depth boolean circuits , 2011, Commun. ACM.

[16]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[17]  S. Carroll,et al.  Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton , 2014, 1405.6903.

[18]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[19]  Matus Telgarsky,et al.  Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[20]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[23]  Gerd Folkers,et al.  On computable numbers , 2016 .

[24]  Surya Ganguli,et al.  Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[25]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[26]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[27]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[28]  Finn Macleod Unreasonable Effectivness of Deep Learning , 2018, ArXiv.