Stochastically Rank-Regularized Tensor Regression Networks

Over-parametrization of deep neural networks has recently been shown to be key to their successful training. However, it also renders them prone to overfitting and makes them expensive to store and train. Tensor regression networks significantly reduce the number of effective parameters in deep neural networks while retaining accuracy and the ease of training. They replace the flattening and fully-connected layers with a tensor regression layer, where the regression weights are expressed through the factors of a low-rank tensor decomposition. In this paper, to further improve tensor regression networks, we propose a novel stochastic rank-regularization. It consists of a novel randomized tensor sketching method to approximate the weights of tensor regression layers. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. Extensive experimental results with both synthetic data and real world datasets (i.e., CIFAR-100 and the UK Biobank brain MRI dataset) support that the proposed approach i) improves performance in both classification and regression tasks, ii) decreases overfitting, iii) leads to more stable training and iv) improves robustness to adversarial attacks and random noise.

[1]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[2]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[3]  D. Mathalon,et al.  A quantitative magnetic resonance imaging study of changes in brain morphology from infancy to late adulthood. , 1994, Archives of neurology.

[4]  H. Gudbjartsson,et al.  The rician distribution of noisy mri data , 1995, Magnetic resonance in medicine.

[5]  David H. Miller,et al.  Correction for variations in MRI scanner sensitivity in brain studies with histogram matching , 1998, Magnetic resonance in medicine.

[6]  C. DeCarli,et al.  Association of midlife blood pressure to late-life cognitive decline and brain morphology , 1998, Neurology.

[7]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[8]  Michael Brady,et al.  Improved Optimization for the Robust and Accurate Linear Registration and Motion Correction of Brain Images , 2002, NeuroImage.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[11]  Eileen Luders,et al.  Brain maturation: Predicting individual BrainAGE in children and adolescents using structural MRI , 2012, NeuroImage.

[12]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Andrzej Cichocki,et al.  Decomposition of Big Tensors With Low Multilinear Rank , 2014, ArXiv.

[15]  Nikos D. Sidiropoulos,et al.  Parallel Randomly Compressed Cubes : A scalable distributed architecture for big tensor decomposition , 2014, IEEE Signal Processing Magazine.

[16]  Nico Vervliet,et al.  Breaking the Curse of Dimensionality Using Decompositions of Incomplete Tensors: Tensor-based scientific computing in big data analysis , 2014, IEEE Signal Processing Magazine.

[17]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[22]  Yuchen Zhang,et al.  L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Yoram Singer,et al.  Sketching and Neural Networks , 2016, ArXiv.

[26]  Yixin Chen,et al.  Compressing Convolutional Neural Networks in the Frequency Domain , 2015, KDD.

[27]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[28]  Joelle Pineau,et al.  Tensor Regression Networks with various Low-Rank Tensor Approximations , 2017, ArXiv.

[29]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[30]  Giovanni Montana,et al.  Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker , 2016, NeuroImage.

[31]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Anima Anandkumar,et al.  Tensor Contraction Layers for Parsimonious Deep Nets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[35]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[36]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[37]  Hongxia Jin,et al.  Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[38]  Stuart J. Ritchie,et al.  Brain age predicts mortality , 2017, Molecular Psychiatry.

[39]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[40]  Raman Arora,et al.  On the Implicit Bias of Dropout , 2018, ICML.

[41]  G. Wainrib,et al.  Brain age prediction of healthy subjects on anatomic MRI with deep learning : going beyond with an “explainable AI” mindset , 2018, bioRxiv.

[42]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[43]  Chao Li,et al.  Randomized Tensor Ring Decomposition and Its Application to Large-scale Data Reconstruction , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Anima Anandkumar,et al.  Tensor Regression Networks , 2017, J. Mach. Learn. Res..

[45]  Steven L. Brunton,et al.  Randomized CP tensor decomposition , 2017, Mach. Learn. Sci. Technol..