Optimal Low Rank Tensor Factorization for Deep Learning

Dense connectivity in latent variable models, recommender systems and deep neural networks make them resource intensive. As the data keeps on growing, the memory and processing requirements also increases. It is not always feasible to extend these physical units hence, tensor methods are used to optimize and improve their performance in a resource constrained environment. Tensors make them fast, accurate and scalable in machine learning however, this results in trade-off between accuracy and resource requirement. In this paper, we explore the feasibility to convert the dense matrices to tensor train format such that number of parameters are reduced and the expressive power of layers are preserved. Based on tensor rank effect observation, a novel decomposition method is proposed which preserves the underlying model’s accuracy along with time and space optimization by tensor methods.

[1]  Stephen Clark,et al.  A Type-Driven Tensor-Based Semantics for CCG , 2014, EACL 2014.

[2]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[3]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[4]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[5]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[6]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Elnaz Jahani Heravi,et al.  Guide to Convolutional Neural Networks , 2017 .

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Anton Rodomanov,et al.  Putting MRFs on a Tensor Train , 2014, ICML.

[10]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[11]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  David M. Blei,et al.  Content-based recommendations with Poisson factorization , 2014, NIPS.

[14]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[15]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[16]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[18]  Joshua B. Tenenbaum,et al.  Dimensionality Reduction via Program Induction , 2015, AAAI Spring Symposia.

[19]  George E. Dahl Deep Learning Approaches to Problems in Speech Recognition, Computational Chemistry, and Natural Language Text Processing , 2015 .

[20]  Eugene E. Tyrtyshnikov,et al.  Linear algebra for tensor problems , 2009, Computing.

[21]  Xiu Yang,et al.  Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[25]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[28]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[29]  S. Tahar,et al.  Fast statistical analysis of nonlinear analog circuits using model order reduction , 2015 .

[30]  David M. Blei,et al.  Dynamic Poisson Factorization , 2015, RecSys.

[31]  Anima Anandkumar,et al.  Online tensor methods for learning latent variable models , 2013, J. Mach. Learn. Res..

[32]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[33]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[34]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[35]  Yefeng Zheng,et al.  Deep Learning and Convolutional Neural Networks for Medical Image Computing , 2017, Advances in Computer Vision and Pattern Recognition.

[36]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[37]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[38]  Tengfei Liu,et al.  A Survey on Latent Tree Models and Applications , 2013, J. Artif. Intell. Res..

[39]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[40]  Georgiana Dinu,et al.  How to make words with vectors: Phrase generation in distributional semantics , 2014, ACL.

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.