Exploring confusing scene classes for the places dataset: Insights and solutions

Scene classification is more challenging than object classification due to higher ambiguity in scene labels. In this work, we propose to use the filter weights at the last stage of a CNN model trained by the Places dataset, which is also known as the "scene anchor vector (SAV)", to explain the source of confusions. An SAV points to a cluster of images. If two anchor vectors have a smaller angle, we see overlapping image clusters, leading to a set of confusing classes. To overcome it, we propose to merge images associated with confusing anchor vectors into a confusion set and split the set in an unsupervised fashion to create multiple subsets. It is called the "automatic subset clustering (ASC)" process. Each of these subsets contains scene images of strong visual similarity. After the ASC process, we train a random forest (RF) classifier for each confusion subset to allow better scene classification. The ASC/RF scheme can be added on top of any existing scene-classification CNN as a post-processing module with little extra training effort. It is shown by extensive experimental results that, for a given baseline CNN, the ASC/RF scheme can offer a significant performance gain.

[1]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[2]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  C.-C. Jay Kuo,et al.  Big Visual Data Analysis: Scene Classification and Geometric Labeling , 2016 .

[7]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[8]  C.-C. Jay Kuo,et al.  A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method , 2016, ACCV.

[9]  Céline Hudelot,et al.  Hierarchical image annotation using semantic hierarchies , 2012, CIKM.

[10]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[11]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  C.-C. Jay Kuo,et al.  Outdoor Scene Classification Using Labeled Segments , 2016 .

[14]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[16]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[17]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[18]  Vinod Nair,et al.  Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  C.-C. Jay Kuo,et al.  Indoor/Outdoor Classification with Multiple Experts , 2016 .

[20]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification , 2014, ArXiv.

[21]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[22]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dewen Hu,et al.  Scene classification using a multi-resolution bag-of-features model , 2013, Pattern Recognit..

[24]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  C.-C. Jay Kuo Understanding convolutional neural networks with a mathematical model , 2016, J. Vis. Commun. Image Represent..

[26]  C.-C. Jay Kuo,et al.  Global-Attributes Assisted Outdoor Scene Geometric Labeling , 2016 .

[27]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ohad Shamir,et al.  Probabilistic Label Trees for Efficient Large Scale Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Selim Aksoy,et al.  Scene Classification Using Bag-of-Regions Representations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  C.-C. Jay Kuo,et al.  Scene Understanding Datasets , 2016 .

[32]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yuxin Peng,et al.  Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification , 2014, ACM Multimedia.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Qi Tian,et al.  Orientational Pyramid Matching for Recognizing Indoor Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[40]  Jian Dong,et al.  Looking Inside Category: Subcategory-Aware Object Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[43]  Lei Guo,et al.  Learning coarse-to-fine sparselets for efficient object detection and scene classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  C.-C. Jay Kuo,et al.  Large-Scale Indoor/Outdoor Image Classification via Expert Decision Fusion (EDF) , 2014, ACCV Workshops.

[46]  Yuzhuo Ren Techniques for vanishing point detection , 2013 .

[47]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Thomas L. Griffiths,et al.  Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies , 2013, NIPS.

[49]  C.-C. Jay Kuo,et al.  GAL: A global-attributes assisted labeling system for outdoor scenes , 2017, J. Vis. Commun. Image Represent..

[50]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[52]  Songfan Yang,et al.  Multi-scale Recognition with DAG-CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Magda B. Fayek,et al.  Human-inspired features for natural scene classification , 2013, Pattern Recognit. Lett..

[54]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[55]  C.-C. Jay Kuo,et al.  Measuring and Predicting Tag Importance for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.