Decomposition and Extraction: A New Framework for Visual Classification

In this paper, we present a novel framework for visual classification based on hierarchical image decomposition and hybrid midlevel feature extraction. Unlike most midlevel feature learning methods, which focus on the process of coding or pooling, we emphasize that the mechanism of image composition also strongly influences the feature extraction. To effectively explore the image content for the feature extraction, we model a multiplicity feature representation mechanism through meaningful hierarchical image decomposition followed by a fusion step. In particularly, we first propose a new hierarchical image decomposition approach in which each image is decomposed into a series of hierarchical semantical components, i.e, the structure and texture images. Then, different feature extraction schemes can be adopted to match the decomposed structure and texture processes in a dissociative manner. Here, two schemes are explored to produce property related feature representations. One is based on a single-stage network over hand-crafted features and the other is based on a multistage network, which can learn features from raw pixels automatically. Finally, those multiple midlevel features are incorporated by solving a multiple kernel learning task. Extensive experiments are conducted on several challenging data sets for visual classification, and experimental results demonstrate the effectiveness of the proposed method.

[1]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[2]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[3]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[5]  Stefano Alliney,et al.  Digital filters as absolute norm regularizers , 1992, IEEE Trans. Signal Process..

[6]  L. Battelli,et al.  Dissociation between Contour-based and Texture-based Shape Perception: A Single Case Study , 1997 .

[7]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Yves Meyer,et al.  Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: The Fifteenth Dean Jacqueline B. Lewis Memorial Lectures , 2001 .

[9]  Jon Atli Benediktsson,et al.  A new approach for the morphological segmentation of high-resolution satellite imagery , 2001, IEEE Trans. Geosci. Remote. Sens..

[10]  R. Kimchi,et al.  What does visual agnosia tell us about perceptual organization and its relationship to object perception? , 2003, Journal of experimental psychology. Human perception and performance.

[11]  Stanley Osher,et al.  Image Decomposition and Restoration Using Total Variation Minimization and the H1 , 2003, Multiscale Model. Simul..

[12]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Eitan Tadmor,et al.  A Multiscale Image Representation Using Hierarchical (BV, L2 ) Decompositions , 2004, Multiscale Model. Simul..

[15]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[17]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[20]  Tony F. Chan,et al.  Aspects of Total Variation Regularized L[sup 1] Function Approximation , 2005, SIAM J. Appl. Math..

[21]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[22]  Tony F. Chan,et al.  Scale Recognition, Regularization Parameter Selection, and Meyer's G Norm in Total Variation Regularization , 2006, Multiscale Model. Simul..

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[25]  Dorin Comaniciu,et al.  Total variation models for variable lighting face recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[27]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Wotao Yin,et al.  The Total Variation Regularized L1 Model for Multiscale Decomposition , 2007, Multiscale Model. Simul..

[29]  Song-Chun Zhu,et al.  Primal sketch: Integrating structure and texture , 2007, Comput. Vis. Image Underst..

[30]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[32]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  W. Eric L. Grimson,et al.  Learning coupled conditional random field for image decomposition with application on object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Andrew Blake,et al.  Efficiently Combining Contour and Texture Cues for Object Recognition , 2008, BMVC.

[35]  Pierre Soille,et al.  Constrained connectivity for hierarchical image partitioning and simplification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jieping Ye,et al.  Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..

[37]  Zeev Farbman,et al.  Edge-preserving decompositions for multi-scale tone and detail manipulation , 2008, ACM Trans. Graph..

[38]  Xiaoxu Ma,et al.  Learning coupled conditional random field for image decomposition: theory and application in object categorization , 2008 .

[39]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[41]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[44]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[45]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[47]  Song-Chun Zhu,et al.  Learning explicit and implicit visual manifolds by information projection , 2010, Pattern Recognit. Lett..

[48]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[50]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[51]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[52]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[53]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[54]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[55]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[57]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[58]  Barbara Caputo,et al.  Multi Kernel Learning with Online-Batch Optimization , 2012, J. Mach. Learn. Res..

[59]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[60]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Matthieu Cord,et al.  Hybrid Pooling Fusion in the BoW Pipeline , 2012, ECCV Workshops.

[62]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[63]  Xuelong Li,et al.  Beyond Spatial Pyramids: A New Feature Extraction Framework with Dense Spatial Sampling for Image Classification , 2012, ECCV.

[64]  Pierre Soille,et al.  Differential Area Profiles: Decomposition Properties and Efficient Computation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[66]  Song-Chun Zhu,et al.  Learning Hybrid Image Templates (HIT) by Information Projection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Matthieu Cord,et al.  Extended Coding and Pooling in the HMAX Model , 2013, IEEE Transactions on Image Processing.

[68]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Stanley Osher,et al.  A Low Patch-Rank Interpretation of Texture , 2013, SIAM J. Imaging Sci..

[72]  Jian-Huang Lai,et al.  Linear Dependency Modeling for Classifier Fusion and Feature Combination , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Takumi Kobayashi,et al.  BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  T. Goldstein Adaptive Primal Dual Optimization for Image Processing and Learning , 2013 .

[75]  Krishnakumar Balasubramanian,et al.  Smooth sparse coding via marginal regression for learning sparse representations , 2012, Artif. Intell..