Identifiability of Complete Dictionary Learning

Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ is (under)complete. While previous bounds feature a combinatorial term $r \choose k$, we exhibit a sufficient condition involving $\mathcal{O}(r^3/(r-k)^2)$ samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when $k$ equals $r-1$. Our bounds provide a drastic improvement compared to the state of the art, and imply for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10\% of zero entries in $B$), one only requires $\mathcal{O}(r)$ data points to guarantee identifiability.

[1]  Antonio J. Plaza,et al.  Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[2]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[3]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Pierre Comon,et al.  Multiarray Signal Processing: Tensor decomposition meets compressed sensing , 2010, ArXiv.

[6]  Christian Jutten,et al.  Estimating the mixing matrix in Sparse Component Analysis (SCA) based on partial k-dimensional subspace clustering , 2008, Neurocomputing.

[7]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Anima Anandkumar,et al.  When are overcomplete topic models identifiable? uniqueness of tensor tucker decompositions with structured sparsity , 2013, J. Mach. Learn. Res..

[9]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[12]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jaroslaw Blasiok,et al.  An improved analysis of the ER-SpUD dictionary learning algorithm , 2016, ICALP.

[15]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[16]  Radoslaw Adamczak,et al.  A Note on the Sample Complexity of the Er-SpUD Algorithm by Spielman, Wang and Wright for Exact Recovery of Sparsely Used Dictionaries , 2016, J. Mach. Learn. Res..

[17]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[18]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[19]  Lieven De Lathauwer,et al.  On the Uniqueness of the Canonical Polyadic Decomposition of Third-Order Tensors - Part I: Basic Results and Uniqueness of One Factor Matrix , 2013, SIAM J. Matrix Anal. Appl..

[20]  A. Bruckstein,et al.  On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them , 2006 .

[21]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[22]  Rémi Gribonval,et al.  Sparse and Spurious: Dictionary Learning With Noise and Outliers , 2014, IEEE Transactions on Information Theory.

[23]  Andrzej Cichocki,et al.  Multidimensional compressed sensing and their applications , 2013, WIREs Data Mining Knowl. Discov..

[24]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[25]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[26]  Michael Zibulevsky,et al.  Sparse Component Analysis , 2010 .

[27]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[29]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[30]  René Vidal,et al.  Hyperplane Clustering via Dual Principal Component Pursuit , 2017, ICML.

[31]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[32]  Friedrich T. Sommer,et al.  When Can Dictionary Learning Uniquely Recover Sparse Data From Subsamples? , 2011, IEEE Transactions on Information Theory.

[33]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[34]  D. Newman The Double Dixie Cup Problem , 1960 .

[35]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[36]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[37]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[38]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[39]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[40]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[41]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.