Cosine Activation in Compact Network (CACN): Application to Scene Classification

In this paper, we propose a new learning architecture named cosine activation in a compact network (CACN). The CACN is derived from kernel approximation and establishes a nonlinear hidden layer with the cosine activation function. By inheriting fusion ability in kernel approximation while learning parameters in a supervised way, the CACN is a well-directed solution to scene classification. By seamlessly connecting with convolutional neural networks (CNNs), it is easy to construct an end-to-end network. To compensate for the loss of spatial layout information in CNNs, the CACN is further combined with spatial pyramid matching to fuse various information into one holistic picture. The experiments on the MIT indoor and SUN397 datasets show that the CACN delivers high performance and demonstrates its great effectiveness for scene-classification tasks.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Yizhou Yu,et al.  Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[5]  Xiaogang Wang,et al.  Exemplar-AMMs: Recognizing Crowd Movements From Pedestrian Trajectories , 2016, IEEE Transactions on Multimedia.

[6]  Meiqing Wang,et al.  Image denoising via local and nonlocal circulant similarity , 2015, J. Vis. Commun. Image Represent..

[7]  Lei Zhang,et al.  Towards optimal vlad for human action recognition from still images , 2016, ICASSP.

[8]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[9]  Gang Wang,et al.  Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification , 2015, Pattern Recognit..

[10]  Limin Wang,et al.  Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[11]  Nuno Vasconcelos,et al.  Object based Scene Representations using Fisher Scores of Local Subspace Projections , 2016, NIPS.

[12]  Zhao Kang,et al.  Subspace Clustering via Variance Regularized Ridge Regression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[16]  Suresh Venkatasubramanian,et al.  Continuous Kernel Learning , 2016, ECML/PKDD.

[17]  Youshen Xia,et al.  Fast distributed multichannel speech enhancement using novel frequency domain estimators of magnitude-squared spectrum , 2015, Speech Commun..

[18]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[19]  Cristian Sminchisescu,et al.  Fourier Kernel Learning , 2012, ECCV.

[20]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[21]  Torsten Sattler,et al.  Scalable 6-DOF Localization on Mobile Devices , 2014, ECCV.

[22]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[23]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[25]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Paul Newman,et al.  Shady dealings: Robust, long-term visual localisation using illumination invariance , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Shi Quan Bin,et al.  Fast multi-channel image reconstruction using a novel two-dimensional algorithm , 2013, Multimedia Tools and Applications.

[28]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Mao Ye,et al.  Semi-supervised low-rank representation for image classification , 2017, Signal Image Video Process..

[30]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[32]  Wenzhong Guo,et al.  Sparse Multigraph Embedding for Multimodal Feature Representation , 2017, IEEE Transactions on Multimedia.

[33]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[34]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Shih-Fu Chang,et al.  Compact Nonlinear Maps and Circulant Extensions , 2015, ArXiv.

[36]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[37]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Nuno Vasconcelos,et al.  Deep Scene Image Classification with the MFAFVNet , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[41]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Ling Shao,et al.  Learning Object-to-Class Kernels for Scene Classification , 2014, IEEE Transactions on Image Processing.