Biologically inspired task oriented gist model for scene classification

Capturing the scene gist is account for rapid and accurate scene classification in human visual system. This paper presents a biologically inspired task oriented gist model (BT-Gist) that attempts to emulate two important attributes of biological gist: holistic scene centered spatial layout representation and task oriented resolution determination. For the first attribute, we enrich the model of Oliva and Torralba by refining the low-level features in several biological plausible ways, extending the spatial layout to multiple resolution and followed by perceptually meaningful manifold analysis for a set of multi-resolution biologically inspired intrinsic manifold spatial layouts (BMSLs). Since the optimal resolution that best represents the spatial layout varies from task to task, we embody the second attribute as learning the combination of BMSLs of multiple resolution with respect to their optimal discriminative invariance trade-off for the task at hand, and then cast it in the SVM based localized multiple kernel learning (LMKL) framework, by which the kernel of each scene gist is approximated as a local combination of kernels associated to multi-resolution BMSLs. By exploring the task specific category distribution pattern over BMSL, we define the local model as a category distribution sensitive (CDS) kernel, which can accommodate both the diverse individuality of specific BMSL and the universality shared within the whole category space. Via CDS-LMKL, both the optimal resolution for spatial layouts and the final classifier can be efficiently obtained in a joint manner. We evaluate BT-Gist on four natural scene databases and one cluttered indoor scene database with a range of comparison: From different MKL methods, to various biologically inspired models and BoF based computer vision models. CDS-LMKL leads to better results compared to several existing MKL algorithms. Given the two biological attributes that the framework has to follow, BT-Gist, despite its holistic nature, outperforms existing biologically inspired models and BoF based computer vision models in natural scene classification, and competes with the object segmentation based ROI-Gist in cluttered indoor scene classification.

[1]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[2]  Katsumi Aoki,et al.  Recent development of flow visualization , 2004, J. Vis..

[3]  Dirk B. Walther,et al.  Task-set switching with natural scenes: measuring the cost of deploying top-down attention. , 2007, Journal of vision.

[4]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  Tinne Tuytelaars,et al.  Towards a more discriminative and semantic visual vocabulary , 2011, Comput. Vis. Image Underst..

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[8]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[9]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[10]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[11]  Dirk B. Walther,et al.  Natural Scene Categories Revealed in Distributed Patterns of Activity in the Human Brain , 2009, The Journal of Neuroscience.

[12]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[13]  Laurent Itti,et al.  Comparison of gist models in rapid scene categorization tasks , 2010 .

[14]  Lior Wolf,et al.  Perception Strategies in Hierarchical Vision Systems , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[17]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[18]  Tyng-Luh Liu,et al.  Efficient discriminative local learning for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[20]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[21]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[22]  A. Oliva,et al.  Coarse Blobs or Fine Edges? Evidence That Information Diagnosticity Changes the Perception of Complex Visual Stimuli , 1997, Cognitive Psychology.

[23]  Daniel A. Pollen,et al.  Visual cortical neurons as localized spatial frequency filters , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Aude Oliva,et al.  Estimating perception of scene layout properties from global image features. , 2011, Journal of vision.

[26]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[27]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[28]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[29]  Chiou-Shann Fuh,et al.  Local Ensemble Kernel Learning for Object Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Nao Ninomiya,et al.  The 10th anniversary of journal of visualization , 2007, J. Vis..

[31]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[32]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[33]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[35]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[36]  Trevor Darrell,et al.  Bayesian Localized Multiple Kernel Learning , 2009 .

[37]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[38]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  M. Potter Meaning in visual search. , 1975, Science.

[40]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[41]  S. Engel,et al.  Colour tuning in human visual cortex measured with functional magnetic resonance imaging , 1997, Nature.

[42]  Laurent Itti,et al.  Biologically Inspired Mobile Robot Vision Localization , 2009, IEEE Transactions on Robotics.

[43]  D. Burr,et al.  Vision senses number directly. , 2009, Journal of vision.

[44]  P. Perona,et al.  What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.

[45]  Dacheng Tao,et al.  C1 units for scene classification , 2008, 2008 19th International Conference on Pattern Recognition.

[46]  Antonio Torralba,et al.  Using the forest to see the trees: exploiting context for visual object detection and localization , 2010, CACM.

[47]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[48]  Guizhong Liu,et al.  A Hierarchical GIST Model Embedding Multiple Biological Feasibilities for Scene Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[49]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[50]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[51]  Abel G. Oliva,et al.  Gist of a scene , 2005 .

[52]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[53]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[54]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[55]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[56]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[57]  Gabriel Peyré,et al.  Manifold models for signals and images , 2009, Comput. Vis. Image Underst..

[58]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[60]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[62]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[63]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[65]  Xuelong Li,et al.  Enhanced biologically inspired model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.