Building holistic descriptors for scene recognition: a multi-objective genetic programming approach

Real-world scene recognition has been one of the most challenging research topics in computer vision, due to the tremendous intraclass variability and the wide range of scene categories. In this paper, we successfully apply an evolutionary methodology to automatically synthesize domain-adaptive holistic descriptors for the task of scene recognition, instead of using hand-tuned descriptors. We address this as an optimization problem by using multi-objective genetic programming (MOGP). Specifically, a set of primitive operators and filters are first randomly assembled in theMOGP framework as tree-based combinations, which are then evaluated by two objective fitness criteria i.e., the classification error and the tree complexity. Finally, the best-so-far solution selected by MOGP is regarded as the (near-)optimal feature descriptor for scene recognition. We have evaluated our approach on three realistic scene datasets: MIT urban and nature, SUN and UIUC Sport. Experimental results consistently show that our MOGP-generated descriptors achieve significantly higher recognition accuracies compared with state-of-the-art hand-crafted and machine-learned features.

[1]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[2]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[3]  R. Poli Genetic programming for image analysis , 1996 .

[4]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Leonardo Trujillo,et al.  Synthesis of interest point detectors through genetic programming , 2006, GECCO.

[7]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[8]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Yang Zhang,et al.  Feature Extraction Using Multi-Objective Genetic Programming , 2006, Multi-Objective Machine Learning.

[12]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[16]  Qi Tian,et al.  Mining flickr landmarks by modeling reconstruction sparsity , 2011, TOMCCAP.

[17]  R. Cucchiara Multimedia surveillance systems , 2005, VSSN@MM.

[18]  Leonardo Trujillo,et al.  Interest point detection through multiobjective genetic programming , 2012, Appl. Soft Comput..

[19]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[21]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[22]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[24]  Jiebo Luo,et al.  Review of the State of the Art in Semantic Scene Classification , 2002 .

[25]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.