What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures?

(This paper was written in November 2011 and never published. It is posted on arXiv.org in its original form in June 2016). Many recent object recognition systems have proposed using a two phase training procedure to learn sparse convolutional feature hierarchies: unsupervised pre-training followed by supervised fine-tuning. Recent results suggest that these methods provide little improvement over purely supervised systems when the appropriate nonlinearities are included. This paper presents an empirical exploration of the space of learning procedures for sparse convolutional networks to assess which method produces the best performance. In our study, we introduce an augmentation of the Predictive Sparse Decomposition method that includes a discriminative term (DPSD). We also introduce a new single phase supervised learning procedure that places an L1 penalty on the output state of each layer of the network. This forces the network to produce sparse codes without the expensive pre-training phase. Using DPSD with a new, complex predictor that incorporates lateral inhibition, combined with multi-scale feature pooling, and supervised refinement, the system achieves a 70.6\% recognition rate on Caltech-101. With the addition of convolutional training, a 77\% recognition was obtained on the CIfAR-10 dataset.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[7]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[8]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[11]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[12]  Eero P. Simoncelli,et al.  Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[16]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[18]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[19]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[21]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[23]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[24]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[25]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[26]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.