Geometric Understanding of Deep Learning

Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  Dimitris Samaras,et al.  A Two-Step Computation of the Exact GAN Wasserstein Distance , 2018, ICML.

[3]  Shing-Tung Yau,et al.  Ae-OT: a New Generative Model based on Extended Semi-discrete Optimal transport , 2020, ICLR.

[4]  Y. Brenier Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .

[5]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[6]  Adam M. Oberman,et al.  Numerical solution of the Optimal Transportation problem using the Monge-Ampère equation , 2012, J. Comput. Phys..

[7]  Michael I. Jordan,et al.  The Handbook of Brain Theory and Neural Networks , 2002 .

[8]  Timo Aila,et al.  Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder , 2017, ACM Trans. Graph..

[9]  Yansheng Li,et al.  Unsupervised Spectral–Spatial Feature Learning With Stacked Sparse Autoencoder for Hyperspectral Imagery Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[10]  S. Yau,et al.  Variational Principles for Minkowski Type Problems, Discrete Optimal Transport, and Discrete Monge-Ampere Equations , 2013, 1302.5472.

[11]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[12]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[13]  Jianzhong Wu,et al.  Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images , 2016, IEEE Transactions on Medical Imaging.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[16]  Yann Brenier,et al.  The Monge–Kantorovitch mass transfer and its computational fluid mechanics formulation , 2002 .

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[19]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  C. Villani Topics in Optimal Transportation , 2003 .

[21]  C. Villani Optimal Transport: Old and New , 2008 .

[22]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[23]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[24]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[25]  Shing-Tung Yau,et al.  A Geometric View of Optimal Transportation and Generative Model , 2017, Comput. Aided Geom. Des..

[26]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[27]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[28]  Wei Zeng,et al.  Ricci Flow for 3D Shape Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[31]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[32]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[33]  Nicolas Bonnotte,et al.  From Knothe's Rearrangement to Brenier's Optimal Transport Map , 2012, SIAM J. Math. Anal..

[34]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Dimitris Samaras,et al.  Wasserstein GAN With Quadratic Transport Cost , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Marco Cuturi Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances , 2013, 1306.0895.

[38]  L. Caffarelli Some regularity properties of solutions of Monge Ampère equation , 1991 .

[39]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[40]  Lovedeep Gondara,et al.  Medical Image Denoising Using Convolutional Denoising Autoencoders , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[41]  Gabriel Peyré,et al.  Convolutional wasserstein distances , 2015, ACM Trans. Graph..

[42]  Dong Wang,et al.  Music removal by convolutional denoising autoencoder in speech recognition , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[43]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[44]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[45]  Gabriel Peyré,et al.  Optimal Transport with Proximal Splitting , 2013, SIAM J. Imaging Sci..

[46]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[47]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Wei Zeng,et al.  Optimal Mass Transport for Shape Matching and Comparison , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Xiaoou Tang,et al.  From Facial Expression Recognition to Interpersonal Relation Prediction , 2016, International Journal of Computer Vision.