Minimax Lower Bounds on Dictionary Learning for Tensor Data

This paper provides fundamental limits on the sample complexity of estimating dictionaries for tensor data. The specific focus of this work is on $K$ th-order tensor data and the case where the underlying dictionary can be expressed in terms of $K$ smaller dictionaries. It is assumed the data are generated by linear combinations of these structured dictionary atoms and observed through white Gaussian noise. This work first provides a general lower bound on the minimax risk of dictionary learning for such tensor data and then adapts the proof techniques for specialized results in the case of sparse and sparse-Gaussian linear combinations. The results suggest the sample complexity of dictionary learning for tensor data can be significantly lower than that for unstructured data: for unstructured data it scales linearly with the product of the dictionary dimensions, whereas for tensor-structured data the bound scales linearly with the sum of the product of the dimensions of the (smaller) component dictionaries. A partial converse is provided for the case of 2nd-order tensor data to show that the bounds in this paper can be tight. This involves developing an algorithm for learning highly-structured dictionaries from noisy tensor data. Finally, numerical experiments highlight the advantages associated with explicitly accounting for tensor data structure during dictionary learning.

[1]  Praneeth Netrapalli,et al.  A Clustering Approach to Learning Sparsely Used Overcomplete Dictionaries , 2013, IEEE Transactions on Information Theory.

[2]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[3]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[4]  Syed Zubair,et al.  Tensor dictionary learning with sparse TUCKER decomposition , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[5]  慧 廣瀬 A Mathematical Introduction to Compressive Sensing , 2015 .

[6]  Rémi Gribonval,et al.  Sparse and Spurious: Dictionary Learning With Noise and Outliers , 2014, IEEE Transactions on Information Theory.

[7]  C. Loan The ubiquitous Kronecker product , 2000 .

[8]  Devdatt P. Dubhashi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms: Contents , 2009 .

[9]  Karin Schnass,et al.  Local identification of overcomplete dictionaries , 2014, J. Mach. Learn. Res..

[10]  Jean-Philippe Thiran,et al.  Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Florian Roemer,et al.  Tensor-based algorithms for learning multidimensional separable dictionaries , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Rasmus Bro,et al.  Multi-way Analysis with Applications in the Chemical Sciences , 2004 .

[14]  Andrzej Cichocki,et al.  Computing Sparse Representations of Multidimensional Signals Using Kronecker Bases , 2013, Neural Computation.

[15]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[16]  Anand D. Sarwate,et al.  Minimax lower bounds for Kronecker-structured dictionary learning , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[17]  Richard G. Baraniuk,et al.  Kronecker Compressive Sensing , 2012, IEEE Transactions on Image Processing.

[18]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[19]  Martin Kleinsteuber,et al.  Separable Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[21]  Yonina C. Eldar,et al.  Performance limits of dictionary learning for sparse coding , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[22]  Misha Elena Kilmer,et al.  A tensor-based dictionary learning approach to tomographic image reconstruction , 2015, BIT Numerical Mathematics.

[23]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[25]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[26]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[27]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[28]  Martin J. Wainwright,et al.  Information-theoretic bounds on model selection for Gaussian Markov random fields , 2010, 2010 IEEE International Symposium on Information Theory.

[29]  Yen-Wei Chen,et al.  K-CPD: Learning of overcomplete dictionaries for tensor sparse coding , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[30]  Y. Rivenson,et al.  An Efficient Method for Multi-Dimensional Compressive Imaging , 2009 .

[31]  Anand D. Sarwate,et al.  STARK: Structured dictionary learning through rank-one tensor recovery , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[32]  G KoldaTamara,et al.  Tensor Decompositions and Applications , 2009 .

[33]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[34]  Adrian Stern,et al.  Compressed Imaging With a Separable Sensing Operator , 2009, IEEE Signal Processing Letters.

[35]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[36]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[37]  Anand D. Sarwate,et al.  Sample complexity bounds for dictionary learning of tensor data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Zemin Zhang,et al.  Denoising and Completion of 3D Data via Multidimensional Dictionary Learning , 2015, IJCAI.

[39]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[40]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[41]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[42]  Praneeth Netrapalli,et al.  J ul 2 01 4 A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries , 2014 .

[43]  Karin Schnass,et al.  On the Identifiability of Overcomplete Dictionaries via the Minimisation Principle Underlying K-SVD , 2013, ArXiv.

[44]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[45]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[46]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[47]  Anima Anandkumar,et al.  Exact Recovery of Sparsely Used Overcomplete Dictionaries , 2013, ArXiv.

[48]  Yonina C. Eldar,et al.  On the Minimax Risk of Dictionary Learning , 2015, IEEE Transactions on Information Theory.

[49]  Gene H. Golub,et al.  Matrix computations , 1983 .

[50]  EnganKjersti,et al.  Dictionary learning algorithms for sparse representation , 2003 .

[51]  Yi Yang,et al.  Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[53]  Cássio Fraga Dantas,et al.  Learning Dictionaries as a Sum of Kronecker Products , 2017, IEEE Signal Processing Letters.

[54]  Misha Elena Kilmer,et al.  Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging , 2013, SIAM J. Matrix Anal. Appl..

[55]  A. Bruckstein,et al.  On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them , 2006 .

[56]  Yuhong Yang Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[57]  W. Marsden I and J , 2012 .