Scalable Mobile Visual Classification by Kernel Preserving Projection Over High-Dimensional Features

Scalable mobile visual classification-classifying images/videos in a large semantic space on mobile devices in real-time-is an emerging problem as observing the paradigm shift towards mobile platforms and the explosive growth of visual data. Though seeing the advances in detecting thousands of concepts in the servers, the scalability is handicapped in mobile devices due to the severe resource constraints within. However, certain emerging applications require such scalable visual classification with prompt response for detecting local contexts (e.g., Google Glass) or ensuring user satisfaction. In this work, we point out the ignored challenges for scalable mobile visual classification and provide a feasible solution. To overcome the limitations of mobile visual classification, we propose an unsupervised linear dimension reduction algorithm, kernel preserving projection (KPP), which approximates the kernel matrix of high dimensional features with low dimensional linear embedding. We further introduce sparsity to the projection matrix to ensure its compliance with mobile computing (with merely 12% non-zero entries). By inspecting the similarity of linear dimension reduction with low-rank linear distance metric and Taylor expansion of RBF kernel, we justified the feasibility for the proposed KPP method over high-dimensional features. Experimental results on three public datasets confirm that the proposed method outperforms existing dimension reduction methods. What is even more, we can greatly reduce the storage consumption and efficiently compute the classification results on the mobile devices.

[1]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[2]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[3]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[5]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[8]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[9]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[10]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[12]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[13]  Zhen Li,et al.  Extended Hierarchical Gaussianization for scene classification , 2010, 2010 IEEE International Conference on Image Processing.

[14]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[15]  Arnold W. M. Smeulders,et al.  Convex reduction of high-dimensional kernels for visual classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[17]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Peng Liu,et al.  Semi-supervised sparse metric learning using alternating linearization optimization , 2010, KDD.

[19]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Nozha Boujemaa,et al.  Pl@ntNet mobile app , 2013, ACM Multimedia.

[21]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[22]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[23]  Chong-Wah Ngo,et al.  Sampling and Ontologically Pooling Web Images for Visual Concept Learning , 2012, IEEE Transactions on Multimedia.

[24]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Mark W. Schmidt,et al.  Graphical model structure learning using L₁-regularization , 2010 .

[27]  Nozha Boujemaa,et al.  Hash-Based Support Vector Machines Approximation for Large Scale Prediction , 2012, BMVC.

[28]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[29]  Yihong Gong,et al.  Discovering Image Semantics in Codebook Derivative Space , 2012, IEEE Transactions on Multimedia.

[30]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[31]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[34]  Lexing Xie,et al.  Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining , 2013, IEEE MultiMedia.

[35]  Jiayan Jiang,et al.  Learning a mixture of sparse distance metrics for classification and dimensionality reduction , 2011, 2011 International Conference on Computer Vision.

[36]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[37]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[38]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[39]  Jia Deng,et al.  Large scale visual recognition , 2012 .