论文信息 - Improved spatial pyramid matching for scene recognition

Improved spatial pyramid matching for scene recognition

Abstract A scene image is typically composed of successive background contexts and objects with regular shapes. To acquire such spatial information, we propose a new type of spatial partitioning scheme and a modified pyramid matching kernel based on spatial pyramid matching (SPM). A dense histogram of oriented gradients (HOG) is used as a low-level visual descriptor. Furthermore, inspired by the expressive coding ability of autoencoders, we also propose another approach that encodes local descriptors into mid-level features using various autoencoders. The learned mid-level features are encouraged to be sparse, robust and contractive. Then, modified spatial pyramid pooling and local normalization of the mid-level features facilitate the generation of high-level image signatures for scene classification. Comprehensive experimental results on publicly available scene datasets demonstrate the effectiveness of our methods.

[1] Thomas S. Huang,et al. Novel Gaussianized vector representation for improved natural scene categorization , 2010, Pattern Recognit. Lett..

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4] Andrew W. Fitzgibbon,et al. Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[5] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Yap-Peng Tan,et al. Nonlinear dictionary learning with application to image classification , 2018, Pattern Recognit..

[8] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[9] Kazuhiro Hotta,et al. Local co-occurrence features in subspace obtained by KPCA of local blob visual words for scene classification , 2012, Pattern Recognit..

[10] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11] Kai Zheng,et al. Category co-occurrence modeling for large scale scene recognition , 2016, Pattern Recognit..

[12] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Nathalie Japkowicz,et al. Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[14] James M. Rehg,et al. CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Trevor Darrell,et al. The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Nanning Zheng,et al. Constructing Deep Sparse Coding Network for image classification , 2017, Pattern Recognit..

[18] James M. Rehg,et al. Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Cheng-Lin Liu,et al. Adaptive spatial pooling for image classification , 2016, Pattern Recognit..

[20] Dewen Hu,et al. Scene classification using a multi-resolution bag-of-features model , 2013, Pattern Recognit..

[21] Kazuhiro Hotta,et al. Local autocorrelation of similarities with subspaces for shift invariant scene classification , 2011, Pattern Recognit..

[22] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[23] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25] Liang-Tien Chia,et al. Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[28] N. H. C. Yung,et al. Scene categorization via contextual visual words , 2010, Pattern Recognit..

[29] Yuning Jiang,et al. Randomized Spatial Partition for Scene Recognition , 2012, ECCV.

[30] Yasuo Kuniyoshi,et al. Discriminative spatial pyramid , 2011, CVPR 2011.

[31] Jun Yu,et al. Pairwise constraints based multiview features fusion for scene classification , 2013, Pattern Recognit..

[32] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33] Gang Wang,et al. Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification , 2015, Pattern Recognit..

[34] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39] Zhengzhi Wang,et al. Building global image features for scene recognition , 2012, Pattern Recognit..

[40] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[41] Antonio Torralba,et al. Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[42] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43] Fahad Shahbaz Khan,et al. Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.