论文信息 - The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)

The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)

Sparse coding is a proven principle for learning compact representations of images. However, sparse coding by itself often leads to very redundant dictionaries. With images, this often takes the form of similar edge detectors which are replicated many times at various positions, scales and orientations. An immediate consequence of this observation is that the estimation of the dictionary components is not statistically efficient. We propose a factored model in which factors of variation (e.g. position, scale and orientation) are untangled from the underlying Gabor-like filters. There is so much redundancy in sparse codes for natural images that our model requires only a single dictionary element (a Gabor-like edge detector) to outperform standard sparse coding. Our model scales naturally to arbitrary-sized images while achieving much greater statistical efficiency during learning. We validate this claim with a number of experiments showing, in part, superior compression of out-of-sample data using a sparse coding dictionary learned with only a single image.

Yoshua Bengio | Aaron C. Courville | James Bergstra | Yoshua Bengio | J. Bergstra

[1] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[2] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.

[3] Rajesh P. N. Rao,et al. Bilinear Sparse Coding for Invariant Vision , 2005, Neural Computation.

[4] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[5] Terrence J. Sejnowski,et al. Learning Overcomplete Representations , 2000, Neural Computation.

[6] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[7] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[8] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[9] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[10] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[11] Michael Elad,et al. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.