Robust classification of objects, faces, and flowers using natural image statistics

Classification of images in many category datasbets has rapidly improved in recent years. However, systems that perform well on particular datasets typically have one or more limitations such as a failure to generalize across visual tasks (e.g., requiring a face detector or extensive retuning of parameters), insufficient translation invariance, inability to cope with partial views and occlusion, or significant performance degradation as the number of classes is increased. Here we attempt to overcome these challenges using a model that combines sequential visual attention using fixations with sparse coding. The model's biologically-inspired filters are acquired using unsupervised learning applied to natural image patches. Using only a single feature type, our approach achieves 78.5% accuracy on Caltech-101 and 75.2% on the 102 Flowers dataset when trained on 30 instances per class and it achieves 92.7% accuracy on the AR Face database with 1 training instance per person. The same features and parameters are used across these datasets to illustrate its robust performance.

[1]  Michael W. Levine,et al.  Fundamentals of sensation and perception , 1981 .

[2]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[3]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[4]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[6]  J. Richards Cognitive neuroscience of attention : a developmental perspective , 1998 .

[7]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[8]  Michael H. Brill,et al.  Color appearance models , 1998 .

[9]  A. Martínez,et al.  The AR face databasae , 1998 .

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[12]  T. W. Lee,et al.  Chromatic structure of natural scenes. , 2001, Journal of the Optical Society of America. A, Optics, image science, and vision.

[13]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[14]  P. Glimcher Making choices: the neurophysiology of visual-saccadic decision making , 2001, Trends in Neurosciences.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[17]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  David J Tolhurst,et al.  Independent components of color natural scenes resemble V1 neurons in their spatial and color tuning. , 2004, Journal of neurophysiology.

[19]  D. Hubel,et al.  The role of fixational eye movements in visual perception , 2004, Nature Reviews Neuroscience.

[20]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[22]  P. Tichavský,et al.  Efficient variant of algorithm fastica for independent component analysis attaining the cramer-RAO lower bound , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[23]  Lucas Paletta,et al.  Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[24]  Kai-Sheng Song,et al.  A globally convergent and consistent method for estimating the shape parameter of a generalized Gaussian distribution , 2006, IEEE Transactions on Information Theory.

[25]  Daniel J. Graham,et al.  Can the theory of “whitening” explain the center-surround properties of retinal ganglion cell receptive fields? , 2006, Vision Research.

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Erkki Oja,et al.  Efficient Variant of Algorithm FastICA for Independent Component Analysis Attaining the CramÉr-Rao Lower Bound , 2006, IEEE Transactions on Neural Networks.

[28]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Weiguo Gong,et al.  Uncorrelated linear discriminant analysis based on weighted pairwise Fisher criterion , 2007, Pattern Recognit..

[30]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[31]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[32]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[33]  Nobuyuki Morioka Learning Object Representations Using Sequential Patterns , 2008, Australasian Conference on Artificial Intelligence.

[34]  Garrison W. Cottrell,et al.  Looking around the backyard helps to recognize faces and digits , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[36]  Janet Hui-wen Hsiao,et al.  NIMBLE: a kernel density model of saccade-based visual memory. , 2008, Journal of vision.

[37]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Eric O. Postma,et al.  Toward a Visual Cognitive System Using Active Top-Down Saccadic Control , 2008, Int. J. Humanoid Robotics.

[39]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[40]  Nicolas Pinto,et al.  Establishing Good Benchmarks and Baselines for Face Recognition , 2008 .

[41]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[44]  Richa Singh,et al.  Face recognition with disguise and single gallery images , 2009, Image Vis. Comput..

[45]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Peter H. Schiller,et al.  Neural Control of Visually Guided Eye Movements , 2012 .