Deep Learning and Hierarchal Generative Models

It is argued that deep learning is efficient for data that is generated from hierarchal generative models. Examples of such generative models include wavelet scattering networks, functions of compositional structure, and deep rendering models. Unfortunately so far, for all such models, it is either not rigorously known that they can be learned efficiently, or it is not known that "deep algorithms" are required in order to learn them. We propose a simple family of "generative hierarchal models" which can be efficiently learned and where "deep" algorithm are necessary for learning. Our definition of "deep" algorithms is based on the empirical observation that deep nets necessarily use correlations between features. More formally, we show that in a semi-supervised setting, given access to low-order moments of the labeled data and all of the unlabeled data, it is information theoretically impossible to perform classification while at the same time there is an efficient algorithm, that given all labelled and unlabeled data, perfectly labels all unlabelled data with high probability. For the proof, we use and strengthen the fact that Belief Propagation does not admit a good approximation in terms of linear functions.

[1]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.

[2]  Elchanan Mossel Reconstruction on Trees: Beating the Second Eigenvalue , 2001 .

[3]  Allan Sly,et al.  Phase transition in the sample complexity of likelihood-based phylogeny inference , 2015, 1508.01964.

[4]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[5]  Amit Daniely,et al.  Complexity Theoretic Limitations on Learning DNF's , 2014, COLT.

[6]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[7]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[8]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Elchanan Mossel,et al.  Robust reconstruction on trees is determined by the second eigenvalue , 2004, math/0406447.

[11]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[12]  Elchanan Mossel,et al.  Information flow on trees , 2001, math/0107033.

[13]  H. Kesten,et al.  Additional Limit Theorems for Indecomposable Multidimensional Galton-Watson Processes , 1966 .

[14]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[15]  Tomaso Poggio,et al.  Learning Functions: When Is Deep Better Than Shallow , 2016, 1603.00988.

[16]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[17]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[18]  Ohad Shamir,et al.  Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.

[19]  N. Pierce Origin of Species , 1914, Nature.

[20]  Elchanan Mossel Phase transitions in phylogeny , 2003, Transactions of the American Mathematical Society.

[21]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[22]  J. A. Cavender Taxonomy with confidence , 1978 .

[23]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[24]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[25]  Elchanan Mossel,et al.  On the Impossibility of Reconstructing Ancestral Data and Phylogenies , 2003, J. Comput. Biol..

[26]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.