Tensor Dropout for Robust Learning.

CNNs achieve remarkable performance by leveraging deep, over-parametrized architectures, trained on large datasets. However, they have limited generalization ability to data outside the training domain, and a lack of robustness to noise and adversarial attacks. By building better inductive biases, we can improve robustness and also obtain smaller networks that are more memory and computationally efficient. While standard CNNs use matrix computations, we study tensor layers that involve higher-order computations and provide better inductive bias. Specifically, we impose low-rank tensor structures on the weights of tensor regression layers to obtain compact networks, and propose tensor dropout, a randomization in the tensor rank for robustness. We show that our approach outperforms other methods for large-scale image classification on ImageNet and CIFAR-100. We establish a new state-of-the-art accuracy for phenotypic trait prediction on the largest dataset of brain MRI, the UK Biobank brain MRI dataset, where multi-linear structure is paramount. In all cases, we demonstrate superior performance and significantly improved robustness, both to noisy inputs and to adversarial attacks. We rigorously validate the theoretical validity of our approach by establishing the link between our randomized decomposition and non-linear dropout.

[1]  Maja Pantic,et al.  Efficient N-Dimensional Convolutions via Higher-Order Factorization , 2019, ArXiv.

[2]  Eileen Luders,et al.  Brain maturation: Predicting individual BrainAGE in children and adolescents using structural MRI , 2012, NeuroImage.

[3]  C. DeCarli,et al.  Association of midlife blood pressure to late-life cognitive decline and brain morphology , 1998, Neurology.

[4]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[5]  Matthias Bethge,et al.  Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX , 2020, J. Open Source Softw..

[6]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[7]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[8]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[9]  F. Kruggel,et al.  Three-dimensional texture analysis of MRI brain datasets , 2001, IEEE Transactions on Medical Imaging.

[10]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[11]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Yoram Singer,et al.  Sketching and Neural Networks , 2016, ArXiv.

[14]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[15]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[17]  Julien Mairal,et al.  A Kernel Perspective for Regularizing Deep Neural Networks , 2018, ICML.

[18]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[19]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[20]  H. Gudbjartsson,et al.  The rician distribution of noisy mri data , 1995, Magnetic resonance in medicine.

[21]  Giovanni Montana,et al.  Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker , 2016, NeuroImage.

[22]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[23]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  Anima Anandkumar,et al.  Tensor Contraction Layers for Parsimonious Deep Nets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Anima Anandkumar,et al.  Tensor Regression Networks , 2017, J. Mach. Learn. Res..

[29]  Nikos D. Sidiropoulos,et al.  Parallel Randomly Compressed Cubes : A scalable distributed architecture for big tensor decomposition , 2014, IEEE Signal Processing Magazine.

[30]  Yixin Chen,et al.  Compressing Convolutional Neural Networks in the Frequency Domain , 2015, KDD.

[31]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[32]  Nico Vervliet,et al.  Breaking the Curse of Dimensionality Using Decompositions of Incomplete Tensors: Tensor-based scientific computing in big data analysis , 2014, IEEE Signal Processing Magazine.

[33]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[34]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[35]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[36]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[37]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[39]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[40]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[41]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Michael Brady,et al.  Improved Optimization for the Robust and Accurate Linear Registration and Motion Correction of Brain Images , 2002, NeuroImage.

[43]  Stuart J. Ritchie,et al.  Brain age predicts mortality , 2017, Molecular Psychiatry.

[44]  David H. Miller,et al.  Correction for variations in MRI scanner sensitivity in brain studies with histogram matching , 1998, Magnetic resonance in medicine.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Steven L. Brunton,et al.  Randomized CP tensor decomposition , 2017, Mach. Learn. Sci. Technol..

[47]  D. Mathalon,et al.  A quantitative magnetic resonance imaging study of changes in brain morphology from infancy to late adulthood. , 1994, Archives of neurology.

[48]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[49]  Chao Li,et al.  Randomized Tensor Ring Decomposition and Its Application to Large-scale Data Reconstruction , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Yuchen Zhang,et al.  L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[51]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[52]  Andrzej Cichocki,et al.  Decomposition of Big Tensors With Low Multilinear Rank , 2014, ArXiv.

[53]  Joelle Pineau,et al.  Tensor Regression Networks with various Low-Rank Tensor Approximations , 2017, ArXiv.

[54]  S. Filippi,et al.  Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders , 2020, Scientific Reports.

[55]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[57]  Hongxia Jin,et al.  Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[58]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[59]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[60]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[61]  A. Meyer-Lindenberg,et al.  Normal age-related brain morphometric changes: nonuniformity across cortical thickness, surface area and gray matter volume? , 2012, Neurobiology of Aging.

[62]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[63]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[64]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.