Deep Spatial Pyramid Match Kernel for Scene Classification

Several works have shown that Convolutional Neural Networks (CNNs) can be easily adapted to different datasets and tasks. However, for extracting the deep features from these pre-trained deep CNNs a fixedsize (e.g., 227×227) input image is mandatory. Now the state-of-the-art datasets like MIT-67 and SUN-397 come with images of different sizes. Usage of CNNs for these datasets enforces the user to bring different sized images to a fixed size either by reducing or enlarging the images. The curiosity is obvious that “Isn’t the conversion to fixed size image is lossy ?”. In this work, we provide a mechanism to keep these lossy fixed size images aloof and process the images in its original form to get set of varying size deep feature maps, hence being lossless. We also propose deep spatial pyramid match kernel (DSPMK) which amalgamates set of varying size deep feature maps and computes a matching score between the samples. Proposed DSPMK act as a dynamic kernel in the classification framework of scene dataset using support vector machine. We demonstrated the effectiveness of combining the power of varying size CNN-based set of deep feature maps with dynamic kernel by achieving state-of-the-art results for high-level visual recognition tasks such as scene classification on standard datasets like MIT67 and SUN397.

[1]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Chellu Chandra Sekhar,et al.  GMM-Based Intermediate Matching Kernel for Classification of Varying Length Patterns of Long Duration Speech Using Support Vector Machines , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Xiu-Shen Wei,et al.  Deep Spatial Pyramid: The Devil is Once Again in the Details , 2015, ArXiv.

[7]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[8]  In-So Kweon,et al.  Fisher Kernel for Deep Neural Activations , 2014, ArXiv.

[9]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[10]  In-So Kweon,et al.  Multi-scale pyramid pooling for deep convolutional representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Shikha Gupta,et al.  Segment-level pyramid match kernels for the classification of varying length patterns of speech using SVMs , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[15]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[16]  Xiaogang Wang,et al.  Fully Convolutional Neural Networks for Crowd Segmentation , 2014, ArXiv.

[17]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  C. Chandra Sekhar,et al.  Dynamic Kernels based Approaches to Analysis of Varying Length Patterns in Speech and Image Processing Tasks , 2017 .

[19]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Shikha Gupta,et al.  Segment-Level Probabilistic Sequence Kernel Based Support Vector Machines for Classification of Varying Length Patterns of Speech , 2016, ICONIP.

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.