What you need to know about the state-of-the-art computational models of object-vision: A tour through the models

Models of object vision have been of great interest in computer vision and visual neuroscience. During the last decades, several models have been developed to extract visual features from images for object recognition tasks. Some of these were inspired by the hierarchical structure of primate visual system, and some others were engineered models. The models are varied in several aspects: models that are trained by supervision, models trained without supervision, and models (e.g. feature extractors) that are fully hard-wired and do not need training. Some of the models come with a deep hierarchical structure consisting of several layers, and some others are shallow and come with only one or two layers of processing. More recently, new models have been developed that are not hand-tuned but trained using millions of images, through which they learn how to extract informative task-related features. Here I will survey all these different models and provide the reader with an intuitive, as well as a more detailed, understanding of the underlying computations in each of the models.

[1]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[3]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[4]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[5]  Thomas Serre,et al.  On the Role of Object-Specific Features for Real World Object Recognition in Biological Vision , 2002, Biologically Motivated Computer Vision.

[6]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[7]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[8]  Seyed-Mahdi Khaligh-Razavi,et al.  How Can Selection of Biologically Inspired Features Improve the Performance of a Robust Object Recognition Model? , 2012, PloS one.

[9]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Thomas Deselaers,et al.  Global and efficient self-similarity for object classification and detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[12]  Guizhong Liu,et al.  Biologically inspired task oriented gist model for scene classification , 2013, Comput. Vis. Image Underst..

[13]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[14]  Yoshua Bengio,et al.  Deep Learning of Representations , 2013, Handbook on Neural Information Processing.

[15]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  Efficient retrieval of deformable shape classes using local self-similarities , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[19]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Shu Liao,et al.  Dominant Local Binary Patterns for Texture Classification , 2009, IEEE Transactions on Image Processing.

[21]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Matti Pietikäinen,et al.  Computer Vision Using Local Binary Patterns , 2011, Computational Imaging and Vision.

[26]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Lior Wolf,et al.  Using Biologically Inspired Features for Face Processing , 2007, International Journal of Computer Vision.

[28]  Leslie G. Ungerleider,et al.  The modular organization of projections from areas V1 and V2 to areas V4 and TEO in macaques , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[29]  James M. Tromans,et al.  A Computational Model of the Development of Separate Representations of Facial Identity and Expression in the Primate Visual System , 2011, PloS one.

[30]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[31]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[32]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[33]  David M. Santucci,et al.  A Biologically Plausible Transform for Visual Recognition that is Invariant to Translation, Scale, and Rotation , 2011, Front. Comput. Neurosci..

[34]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[36]  Guizhong Liu,et al.  A Hierarchical GIST Model Embedding Multiple Biological Feasibilities for Scene Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[37]  Reza Ebrahimpour,et al.  Feedforward object-vision models only tolerate small image variations compared to human , 2014, Front. Comput. Neurosci..

[38]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[39]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[40]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[41]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[45]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[47]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[48]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[49]  Sven Behnke,et al.  Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[50]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[51]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[52]  Matti Pietikäinen,et al.  A Generalized Local Binary Pattern Operator for Multiresolution Gray Scale and Rotation Invariant Texture Classification , 2001, ICAPR.

[53]  E T Rolls,et al.  Invariant object recognition with trace learning and multiple stimuli present during training , 2007, Network.

[54]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[55]  Matti Pietikäinen,et al.  Local Binary Patterns , 2010, Scholarpedia.

[56]  Seyed-Mahdi Khaligh-Razavi,et al.  A Stable Biologically Motivated Learning Mechanism for Visual Feature Extraction to Handle Facial Categorization , 2012, PloS one.

[57]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[58]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[60]  S. Grossberg How does the cerebral cortex work? Learning, attention, and grouping by the laminar circuits of visual cortex. , 1999, Spatial vision.

[61]  Edmund T. Rolls,et al.  Learning invariant object recognition in the visual system with continuous transformations , 2006, Biological Cybernetics.

[62]  Matti Pietikäinen,et al.  Classification with color and texture: jointly or separately? , 2004, Pattern Recognit..

[63]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[64]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.