论文信息 - An improved analysis of the ER-SpUD dictionary learning algorithm

An improved analysis of the ER-SpUD dictionary learning algorithm

In "dictionary learning" we observe $Y = AX + E$ for some $Y\in\mathbb{R}^{n\times p}$, $A \in\mathbb{R}^{m\times n}$, and $X\in\mathbb{R}^{m\times p}$. The matrix $Y$ is observed, and $A, X, E$ are unknown. Here $E$ is "noise" of small norm, and $X$ is column-wise sparse. The matrix $A$ is referred to as a {\em dictionary}, and its columns as {\em atoms}. Then, given some small number $p$ of samples, i.e.\ columns of $Y$, the goal is to learn the dictionary $A$ up to small error, as well as $X$. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary $A$ (e.g.\ images in the Haar wavelet basis), and the goal is to learn $A$ from the data to then use it for other applications. Recently, [SWW12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when $E = 0$ and $m = n$. They showed if $X$ has independent entries with an expected $s$ non-zeroes per column for $1 \lesssim s \lesssim \sqrt{n}$, and with non-zero entries being subgaussian, then for $p\gtrsim n^2\log^2 n$ with high probability ER-SpUD outputs matrices $A', X'$ which equal $A, X$ up to permuting and scaling columns (resp.\ rows) of $A$ (resp.\ $X$). They conjectured $p\gtrsim n\log n$ suffices, which they showed was information theoretically necessary for {\em any} algorithm to succeed when $s \simeq 1$. Significant progress was later obtained in [LV15]. We show that for a slight variant of ER-SpUD, $p\gtrsim n\log(n/\delta)$ samples suffice for successful recovery with probability $1-\delta$. We also show that for the unmodified ER-SpUD, $p\gtrsim n^{1.99}$ samples are required even to learn $A, X$ with polynomially small success probability. This resolves the main conjecture of [SWW12], and contradicts the main result of [LV15], which claimed that $p\gtrsim n\log^4 n$ guarantees success whp.

Jaroslaw Blasiok | Jelani Nelson | Jelani Nelson | Jarosław Błasiok

[1] Kyle Luh,et al. Random Matrices: l1 Concentration and Dictionary Learning with Few Samples , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2] Huan Wang,et al. Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[3] Aditya Bhaskara,et al. More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[4] David Steurer,et al. Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[5] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[6] Sjoerd Dirksen,et al. Tail bounds via generic chaining , 2013, ArXiv.

[7] Michael Elad,et al. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[8] Prateek Jain,et al. Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[9] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[10] Jean Ponce,et al. Sparse Modeling for Image and Vision Processing , 2014, Found. Trends Comput. Graph. Vis..

[11] Radoslaw Adamczak,et al. A Note on the Sample Complexity of the Er-SpUD Algorithm by Spielman, Wang and Wright for Exact Recovery of Sparsely Used Dictionaries , 2016, J. Mach. Learn. Res..

[12] John Wright,et al. Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[13] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[14] Phong Q. Nguyen,et al. Learning a Parallelepiped: Cryptanalysis of GGH and NTRU Signatures , 2009, Journal of Cryptology.

[15] M. Elad,et al. $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[16] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[17] Sanjeev Arora,et al. Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders , 2012, Algorithmica.

[18] Yong Xu,et al. Sparse Representation for Brain Signal Processing: A tutorial on methods and applications , 2014, IEEE Signal Processing Magazine.

[19] Sanjeev Arora,et al. New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[20] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[21] M. Talagrand. Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems , 2014 .

[22] Mikhail Belkin,et al. Blind Signal Separation in the Presence of Gaussian Noise , 2012, COLT.

[23] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[24] Santosh S. Vempala,et al. Fourier PCA and robust tensor decomposition , 2013, STOC.

[25] Santosh S. Vempala,et al. Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity , 2014, COLT.

[26] Michael Elad,et al. Compression of facial images using the K-SVD algorithm , 2008, J. Vis. Commun. Image Represent..

[27] Guillermo Sapiro,et al. Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.