Rank Preserving Sparse Learning for Kinect Based Scene Classification

With the rapid development of the RGB-D sensors and the promptly growing population of the low-cost Microsoft Kinect sensor, scene classification, which is a hard, yet important, problem in computer vision, has gained a resurgence of interest recently. That is because the depth of information provided by the Kinect sensor opens an effective and innovative way for scene classification. In this paper, we propose a new scheme for scene classification, which applies locality-constrained linear coding (LLC) to local SIFT features for representing the RGB-D samples and classifies scenes through the cooperation between a new rank preserving sparse learning (RPSL) based dimension reduction and a simple classification method. RPSL considers four aspects: 1) it preserves the rank order information of the within-class samples in a local patch; 2) it maximizes the margin between the between-class samples on the local patch; 3) the L1-norm penalty is introduced to obtain the parsimony property; and 4) it models the classification error minimization by utilizing the least-squares error minimization. Experiments are conducted on the NYU Depth V1 dataset and demonstrate the robustness and effectiveness of RPSL for scene classification.

[1]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, ECCV.

[2]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[4]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luc Van Gool,et al.  Moment invariants for recognition under changing viewpoint and illumination , 2004, Comput. Vis. Image Underst..

[7]  Jun Yu,et al.  Pairwise constraints based multiview features fusion for scene classification , 2013, Pattern Recognit..

[8]  Lianwen Jin,et al.  Discriminative information preservation for face recognition , 2012, Neurocomputing.

[9]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  Xian-Sheng Hua,et al.  Ensemble Manifold Regularization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[14]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[15]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[16]  Yixin Chen,et al.  Content-based image retrieval by clustering , 2003, MIR '03.

[17]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[18]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[19]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[23]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[24]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[25]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[26]  Ales Ude,et al.  Vision-Based Robot Path Planning , 1994 .

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[29]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[30]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[31]  Konstantinos N. Plataniotis,et al.  Face recognition using LDA-based algorithms , 2003, IEEE Trans. Neural Networks.

[32]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[34]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35]  Michel Verleysen,et al.  DD-HDS: A Method for Visualization and Exploration of High-Dimensional Data , 2007, IEEE Transactions on Neural Networks.

[36]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[38]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[39]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Ling Shao,et al.  Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classification , 2013, Signal Process..

[41]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[42]  Hujun Bao,et al.  Semi-supervised topic modeling for image annotation , 2009, MM '09.

[43]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[45]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[46]  Xindong Wu,et al.  Manifold elastic net: a unified framework for sparse dimension reduction , 2010, Data Mining and Knowledge Discovery.

[47]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[48]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[49]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[50]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[51]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[52]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[53]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[54]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[56]  Ling Shao,et al.  Silhouette Analysis-Based Action Recognition Via Exploiting Human Poses , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[57]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[58]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[59]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[60]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[62]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[63]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[64]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[65]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[66]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.