论文信息 - Identifiability of Complete Dictionary Learning

Identifiability of Complete Dictionary Learning

Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix $M$ and an integer $r$, find a dictionary $D$ with $r$ columns and a matrix $B$ with $k$-sparse columns (that is, each column of $B$ has at most $k$ non-zero entries) such that $M \approx DB$. A key issue in SCA is identifiability, that is, characterizing the conditions under which $D$ and $B$ are essentially unique (that is, they are unique up to permutation and scaling of the columns of $D$ and rows of $B$). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of $M$) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when $D$ is (under)complete. While previous bounds feature a combinatorial term $r \choose k$, we exhibit a sufficient condition involving $\mathcal{O}(r^3/(r-k)^2)$ samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by $r-1$ columns of $D$. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when $k$ equals $r-1$. Our bounds provide a drastic improvement compared to the state of the art, and imply for example that for a fixed proportion of zeros (constant and independent of $r$, e.g., 10\% of zero entries in $B$), one only requires $\mathcal{O}(r)$ data points to guarantee identifiability.

Nicolas Gillis | Jérémy E. Cohen | J'er'emy E. Cohen | Nicolas Gillis

[1] Antonio J. Plaza,et al. Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[2] R. Tibshirani,et al. Sparse Principal Component Analysis , 2006 .

[3] Jianhua Z. Huang,et al. Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[4] M. Elad,et al. $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5] Pierre Comon,et al. Multiarray Signal Processing: Tensor decomposition meets compressed sensing , 2010, ArXiv.

[6] Christian Jutten,et al. Estimating the mixing matrix in Sparse Component Analysis (SCA) based on partial k-dimensional subspace clustering , 2008, Neurocomputing.

[7] Michael Elad,et al. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8] Anima Anandkumar,et al. When are overcomplete topic models identifiable? uniqueness of tensor tucker decompositions with structured sparsity , 2013, J. Mach. Learn. Res..

[9] Sanjeev Arora,et al. A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[10] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[11] Huan Wang,et al. Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[12] Alexey Ozerov,et al. Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Jean Ponce,et al. Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Jaroslaw Blasiok,et al. An improved analysis of the ER-SpUD dictionary learning algorithm , 2016, ICALP.

[15] R. Vidal,et al. Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[16] Radoslaw Adamczak,et al. A Note on the Sample Complexity of the Er-SpUD Algorithm by Spielman, Wang and Wright for Exact Recovery of Sparsely Used Dictionaries , 2016, J. Mach. Learn. Res..

[17] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.

[18] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[19] Lieven De Lathauwer,et al. On the Uniqueness of the Canonical Polyadic Decomposition of Third-Order Tensors - Part I: Basic Results and Uniqueness of One Factor Matrix , 2013, SIAM J. Matrix Anal. Appl..

[20] A. Bruckstein,et al. On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them , 2006 .

[21] Michael Elad,et al. Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[22] Rémi Gribonval,et al. Sparse and Spurious: Dictionary Learning With Noise and Outliers , 2014, IEEE Transactions on Information Theory.

[23] Andrzej Cichocki,et al. Multidimensional compressed sensing and their applications , 2013, WIREs Data Mining Knowl. Discov..

[24] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[25] Michael Elad,et al. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[26] Michael Zibulevsky,et al. Sparse Component Analysis , 2010 .

[27] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28] Yurii Nesterov,et al. Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[29] Pascal Frossard,et al. Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[30] René Vidal,et al. Hyperplane Clustering via Dual Principal Component Pursuit , 2017, ICML.

[31] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[32] Friedrich T. Sommer,et al. When Can Dictionary Learning Uniquely Recover Sparse Data From Subsamples? , 2011, IEEE Transactions on Information Theory.

[33] Fabian J. Theis,et al. Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[34] D. Newman. The Double Dixie Cup Problem , 1960 .

[35] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[36] Hans-Peter Kriegel,et al. Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[37] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[38] John Wright,et al. Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[39] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[40] René Vidal,et al. Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[41] Wing-Kin Ma,et al. Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.