Learning optimal features for visual pattern recognition

The optimal coding hypothesis proposes that the human visual system has adapted to the statistical properties of the environment by the use of relatively simple optimality criteria. We here (i) discuss how the properties of different models of image coding, i.e. sparseness, decorrelation, and statistical independence are related to each other (ii) propose to evaluate the different models by verifiable performance measures (iii) analyse the classification performance on images of handwritten digits (MNIST data base). We first employ the SPARSENET algorithm (Olshausen, 1998) to derive a local filter basis (on 13 × 13 pixels windows). We then filter the images in the database (28 × 28 pixels images of digits) and reduce the dimensionality of the resulting feature space by selecting the locally maximal filter responses. We then train a support vector machine on a training set to classify the digits and report results obtained on a separate test set. Currently, the best state-of-the-art result on the MNIST data base has an error rate of 0,4%. This result, however, has been obtained by using explicit knowledge that is specific to the data (elastic distortion model for digits). We here obtain an error rate of 0,55% which is second best but does not use explicit data specific knowledge. In particular it outperforms by far all methods that do not use data-specific knowledge.

[1]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[2]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[3]  D. Field,et al.  Natural image statistics and efficient coding. , 1996, Network.

[4]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[7]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[8]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  M. Bethge Factorial coding of natural images: how effective are linear models in removing higher-order dependencies? , 2006, Journal of the Optical Society of America. A, Optics, image science, and vision.

[12]  Thomas Martinetz,et al.  SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification , 2005, ICANN.

[13]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[14]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[15]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[18]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[19]  Laurenz Wiskott,et al.  Applying Slow Feature Analysis to Image Sequences Yields a Rich Repertoire of Complex Cell Properties , 2002, ICANN.

[20]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[21]  David J. Field,et al.  Sparse Coding of Natural Images Produces Localized, Oriented, Bandpass Receptive Fields , 1995 .

[22]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.