论文信息 - DLANet: A manifold-learning-based discriminative feature learning network for scene classification

DLANet: A manifold-learning-based discriminative feature learning network for scene classification

Abstract This paper presents Discriminative Locality Alignment Network (DLANet), a novel manifold-learning-based discriminative learnable feature, for wild scene classification. Based on a convolutional structure, DLANet learns the filters of multiple layers by applying DLA and exploits the block-wise histograms of the binary codes of feature maps to generate the local descriptors. A DLA layer maximizes the margin between the inter-class patches and minimizes the distance of the intra-class patches in the local region. In particular, we construct a two-layer DLANet by stacking two DLA layers and a feature layer. It is followed by a popular framework of scene classification, which combines Locality-constrained Linear Coding–Spatial Pyramid Matching (LLC–SPM) and linear Support Vector Machine (SVM). We evaluate DLANet on NYU Depth V1, Scene-15 and MIT Indoor-67. Experiments show that DLANet performs well on depth image. It outperforms the carefully tuned features, including SIFT and is also competitive to the other reported methods.

[1] Jiwen Lu,et al. PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[2] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[4] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5] Andrew Owens,et al. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[6] Michael J. Swain,et al. Color indexing , 1991, International Journal of Computer Vision.

[7] Lorenzo Torresani,et al. Classemes and Other Classifier-Based Features for Efficient Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[9] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[10] Zhigang Luo,et al. Non-Negative Patch Alignment Framework , 2011, IEEE Transactions on Neural Networks.

[11] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Bingbing Ni,et al. Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[13] Dieter Fox,et al. Kernel Descriptors for Visual Recognition , 2010, NIPS.

[14] Ling Shao,et al. Learning Object-to-Class Kernels for Scene Classification , 2014, IEEE Transactions on Image Processing.

[15] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[16] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Antonio Torralba,et al. Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Yann LeCun,et al. Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[19] Liangpei Zhang,et al. Tensor Discriminative Locality Alignment for Hyperspectral Image Spectral–Spatial Feature Extraction , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[20] Dacheng Tao,et al. Discriminative Locality Alignment , 2008, ECCV.

[21] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[22] Anil K. Jain,et al. Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[23] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Qi Tian,et al. Orientational Pyramid Matching for Recognizing Indoor Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Trevor Darrell,et al. Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] James M. Rehg,et al. CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Nuno Vasconcelos,et al. Learning Receptive Fields for Pooling from Tensors of Feature Response , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Nathan Silberman,et al. Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[29] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[31] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[32] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34] Robert Marti,et al. Which is the best way to organize/classify images by content? , 2007, Image Vis. Comput..

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37] Larry S. Davis,et al. Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[38] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[39] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[40] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[41] Xuelong Li,et al. Biologically Inspired Tensor Features , 2009, Cognitive Computation.

[42] Ramin Zabih,et al. Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[43] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[44] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[45] LinLin Shen,et al. Local Gabor Binary Pattern Whitened PCA: A Novel Approach for Face Recognition from Single Image Per Person , 2009, ICB.

[46] Graham W. Taylor,et al. Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[48] SchieleBernt,et al. Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2007 .

[49] Yan Ke,et al. PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[50] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[52] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53] Mohammad Norouzi,et al. Stacks of convolutional Restricted Boltzmann Machines for shift-invariant feature learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[56] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[57] Ales Ude,et al. Vision-Based Robot Path Planning , 1994 .

[58] Andrew Zisserman,et al. Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Xindong Wu,et al. Manifold elastic net: a unified framework for sparse dimension reduction , 2010, Data Mining and Knowledge Discovery.

[60] James M. Keller,et al. Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[61] Barbara Caputo,et al. A realistic benchmark for visual indoor place recognition , 2010, Robotics Auton. Syst..

[62] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[63] Luc Van Gool,et al. Moment invariants for recognition under changing viewpoint and illumination , 2004, Comput. Vis. Image Underst..

[64] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[65] Xuelong Li,et al. Rank Preserving Sparse Learning for Kinect Based Scene Classification , 2013, IEEE Transactions on Cybernetics.

[66] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[67] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Dieter Fox,et al. Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[69] Qi Tian,et al. Ieee Transactions on Image Processing Spatial Pooling of Heterogeneous Features for Image Classification , 2022 .

[70] Cor J. Veenman,et al. Kernel Codebooks for Scene Categorization , 2008, ECCV.

[71] Yihong Gong,et al. Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[72] Xuelong Li,et al. Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[73] Cewu Lu,et al. Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.