Modality-Invariant Image Classification Based on Modality Uniqueness and Dictionary Learning

We present a unified framework for the image classification of image sets taken under varying modality conditions. Our method is motivated by a key observation that the image feature distribution is simultaneously influenced by the semantic-class and the modality category label, which limits the performance of conventional methods for that task. With this insight, we introduce modality uniqueness as a discriminative weight that divides each modality cluster from all other clusters. By leveraging the modality uniqueness, our framework is formulated as unsupervised modality clustering and classifier learning based on modality-invariant similarity kernel. Specifically, in the assignment step, each training image is first assigned to the most similar cluster according to its modality. In the update step, based on the current cluster hypothesis, the modality uniqueness and the sparse dictionary are updated. These two steps are formulated in an iterative manner. Based on the final clusters, a modality-invariant marginalized kernel is then computed, where the similarities between the reconstructed features of each modality are aggregated across all clusters. Our framework enables the reliable inference of semantic-class category for an image, even across large photometric variations. Experimental results show that our method outperforms conventional methods on various benchmarks, such as landmark identification under severely varying weather conditions, domain-adapting image classification, and RGB and near-infrared image classification.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Rama Chellappa,et al.  Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[6]  Shree K. Nayar,et al.  Modeling the space of camera response functions , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[10]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[14]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[15]  Lei Zhang,et al.  Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization , 2010, IEEE Transactions on Image Processing.

[16]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[17]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[19]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Lior Shamir,et al.  Impressionism, expressionism, surrealism: Automated recognition of painters and schools of art , 2010, TAP.

[22]  WuEnhua,et al.  Accurate and efficient cross-domain visual matching leveraging multiple feature representations , 2013 .

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Onur G. Guleryuz,et al.  Nonlinear approximation based image recovery using adaptive sparse reconstructions and iterated denoising-part II: adaptive algorithms , 2006, IEEE Transactions on Image Processing.

[25]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[26]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Alexei A. Efros,et al.  Estimating the Natural Illumination Conditions from a Single Outdoor Image , 2012, International Journal of Computer Vision.

[28]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jian Sun,et al.  Sparse-Coded Features for Image Retrieval , 2013, BMVC.

[33]  Qingming Huang,et al.  Accurate and efficient cross-domain visual matching leveraging multiple feature representations , 2013, The Visual Computer.

[34]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[35]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[37]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[38]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[40]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[41]  Lei Zhang,et al.  Centralized sparse representation for image restoration , 2011, 2011 International Conference on Computer Vision.

[42]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Onur G. Guleryuz,et al.  Nonlinear approximation based image recovery using adaptive sparse reconstructions and iterated denoising-part I: theory , 2006, IEEE Transactions on Image Processing.

[44]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[45]  Daniel Keren,et al.  Painter identification using local features and naive Bayes , 2002, Object recognition supported by user interaction for service robots.

[46]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[47]  Marc Sebban,et al.  Discriminative feature fusion for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[49]  Liqing Zhang,et al.  Edgel index for large-scale sketch-based image search , 2011, CVPR 2011.

[50]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[51]  Michael Elad,et al.  Image Denoising with Shrinkage and Redundant Representations , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[53]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[54]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Gabriel Peyré,et al.  Sparse Modeling of Textures , 2009, Journal of Mathematical Imaging and Vision.

[56]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[57]  Xiaoming Zheng,et al.  Weather Recognition Based on Images Captured by Vision System in Vehicle , 2009, ISNN.

[58]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[59]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[60]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[61]  Jean Ponce,et al.  Automatic alignment of paintings and photographs depicting a 3D scene , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[62]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[64]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[65]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[66]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[67]  Cewu Lu,et al.  Two-Class Weather Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Kristen Grauman,et al.  Reshaping Visual Datasets for Domain Adaptation , 2013, NIPS.

[69]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[71]  Trevor Darrell,et al.  Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[72]  Z. Jane Wang,et al.  An Adaptive Descriptor Design for Object Recognition in the Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[74]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[75]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[76]  Koen E. A. van de Sande,et al.  Fisher and VLAD with FLAIR , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Steven J. Gortler,et al.  A perception-based color space for illumination-invariant image processing , 2008, ACM Trans. Graph..

[78]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[79]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[80]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[81]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[83]  Cewu Lu,et al.  Two-Class Weather Classification , 2014, CVPR.