Natural Scene Recognition Based on Superpixels and Deep Boltzmann Machines

The Deep Boltzmann Machines (DBM) is a state-of-the-art unsupervised learning model, which has been successfully applied to handwritten digit recognition and, as well as object recognition. However, the DBM is limited in scene recognition due to the fact that natural scene images are usually very large. In this paper, an efficient scene recognition approach is proposed based on superpixels and the DBMs. First, a simple linear iterative clustering (SLIC) algorithm is employed to generate superpixels of input images, where each superpixel is regarded as an input of a learning model. Then, a two-layer DBM model is constructed by stacking two restricted Boltzmann machines (RBMs), and a greedy layer-wise algorithm is applied to train the DBM model. Finally, a softmax regression is utilized to categorize scene images. The proposed technique can effectively reduce the computational complexity and enhance the performance for large natural image recognition. The approach is verified and evaluated by extensive experiments, including the fifteen-scene categories dataset the UIUC eight-sports dataset, and the SIFT flow dataset, are used to evaluate the proposed method. The experimental results show that the proposed approach outperforms other state-of-the-art methods in terms of recognition rate.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[4]  Xiangyang Xue,et al.  Learning Hybrid Part Filters for Scene Recognition , 2012, ECCV.

[5]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[6]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[7]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Geoffrey E. Hinton,et al.  A Better Way to Pretrain Deep Boltzmann Machines , 2012, NIPS.

[9]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[10]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[11]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[17]  Yuning Jiang,et al.  Randomized Spatial Partition for Scene Recognition , 2012, ECCV.

[18]  Nuno Vasconcelos,et al.  Scene Recognition on the Semantic Manifold , 2012, ECCV.

[19]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Zhuowen Tu,et al.  Harvesting Mid-level Visual Concepts from Large-Scale Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[25]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[26]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  A. Robertson Historical development of CIE recommended color difference equations , 1990 .

[28]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Hervé Le Borgne,et al.  Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[34]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[35]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.