论文信息 - Hierarchy of Alternating Specialists for Scene Recognition

Hierarchy of Alternating Specialists for Scene Recognition

We introduce a method for improving convolutional neural networks (CNNs) for scene classification. We present a hierarchy of specialist networks, which disentangles the intra-class variation and inter-class similarity in a coarse to fine manner. Our key insight is that each subset within a class is often associated with different types of inter-class similarity. This suggests that existing network of experts approaches that organize classes into coarse categories are suboptimal. In contrast, we group images based on high-level appearance features rather than their class membership and dedicate a specialist model per group. In addition, we propose an alternating architecture with a global ordered- and a global orderless-representation to account for both the coarse layout of the scene and the transient objects. We demonstrate that it leads to better performance than using a single type of representation as well as the fused features. We also introduce a mini-batch soft k-means that allows end-to-end fine-tuning, as well as a novel routing function for assigning images to specialists. Experimental results show that the proposed approach achieves a significant improvement over baselines including the existing tree-structured CNNs with class-based grouping.

Jan-Michael Frahm | Hyo Jin Kim | Hyo Jin Kim | Jan-Michael Frahm

[1] Zhuowen Tu,et al. Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] Yizhou Yu,et al. Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Limin Wang,et al. Locally Supervised Deep Hybrid Model for Scene Recognition. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[5] Gunhee Kim,et al. Taxonomy-Regularized Semantic Deep Convolutional Neural Networks , 2016, ECCV.

[6] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[8] Qiang Chen,et al. Network In Network , 2013, ICLR.

[9] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10] Samy Bengio,et al. Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[11] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Eric P. Xing,et al. Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[15] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] Josef Sivic,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[18] Gong Cheng,et al. RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Nuno Vasconcelos,et al. Object based Scene Representations using Fisher Scores of Local Subspace Projections , 2016, NIPS.

[20] Gunhee Kim,et al. SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization , 2017, ICML.

[21] Rong Jin,et al. Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jiwen Lu,et al. Scene recognition with objectness , 2018, Pattern Recognit..

[23] Lorenzo Torresani,et al. BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections , 2017, ArXiv.

[24] Christopher M. Bishop,et al. Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[25] Matti Pietikäinen,et al. Descriptor Learning Based on Fisher Separation Criterion for Texture Classification , 2010, ACCV.

[26] Lorenzo Torresani,et al. Network of Experts for Large-Scale Image Categorization , 2016, ECCV.

[27] Dorin Comaniciu,et al. Deep Decision Network for Multi-class Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Kaiqi Huang,et al. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Bowen Zhang,et al. Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition , 2016, IEEE Transactions on Image Processing.

[30] Christian Wolf,et al. Modout: Learning Multi-Modal Architectures by Stochastic Regularization , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[31] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Tinne Tuytelaars,et al. Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Matthew Richardson,et al. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[34] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jianxin Wu,et al. mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization , 2014, IEEE Transactions on Image Processing.

[36] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[37] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[38] Nitish Srivastava,et al. Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[39] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[40] Robinson Piramuthu,et al. HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42] Chandra Kambhamettu,et al. Abstraction and Generalization of 3D Structure for Recognition in Large Intra-Class Variation , 2010, ACCV.

[43] Marc'Aurelio Ranzato,et al. Hard Mixtures of Experts for Large Scale Weakly Supervised Vision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Fatih Murat Porikli,et al. Scene Categorization with Spectral Features , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] C. V. Jawahar,et al. Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Shaogang Gong,et al. Person re-identification by probabilistic relative distance comparison , 2011, CVPR 2011.

[47] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48] Luis Herranz,et al. Scene Recognition with CNNs: Objects, Scales and Dataset Bias , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Jana Kosecka,et al. Deep Convolutional Features for Image Based Retrieval and Scene Categorization , 2015, ArXiv.

[50] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[51] Bolei Zhou,et al. Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[52] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Fei-Fei Li,et al. Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[54] Lorenzo Torresani,et al. BranchConnect: Image Categorization with Learned Branch Connections , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55] Nuno Vasconcelos,et al. Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Limin Wang,et al. Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[59] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[60] Dragomir Anguelov,et al. Self-informed neural network structure learning , 2014, ICLR.

[61] Leonid Sigal,et al. A Unified Semantic Embedding: Relating Taxonomies and Attributes , 2014, NIPS.

[62] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[63] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[64] Zhen Li,et al. Blockout: Dynamic Model Selection for Hierarchical Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).