Heterogeneous feature machines for visual recognition

With the recent efforts made by computer vision researchers, more and more types of features have been designed to describe various aspects of visual characteristics. Modeling such heterogeneous features has become an increasingly critical issue. In this paper, we propose a machinery called the Heterogeneous Feature Machine (HFM) to effectively solve visual recognition tasks in need of multiple types of features. Our HFM builds a kernel logistic regression model based on similarities that combine different features and distance metrics. Different from existing approaches that use a linear weighting scheme to combine different features, HFM does not require the weights to remain the same across different samples, and therefore can effectively handle features of different types with different metrics. To prevent the model from overfitting, we employ the so-called group LASSO constraints to reduce model complexity. In addition, we propose a fast algorithm based on co-ordinate gradient descent to efficiently train a HFM. The power of the proposed scheme is demonstrated across a wide variety of visual recognition tasks including scene, event and action recognition.

[1]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[2]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[3]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[4]  Zihan Zhou,et al.  Demo: Robust face recognition via sparse representation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[5]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[6]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[7]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[11]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Willem Stuursma Image classification using ROIs and Multiple Kernel Learning , 2009 .

[14]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[17]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[18]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[19]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[21]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[25]  Mingjing Li,et al.  Color texture moments for content-based image retrieval , 2002, Proceedings. International Conference on Image Processing.

[26]  Martial Hebert,et al.  Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation , 2008, ECCV.

[27]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jiebo Luo,et al.  Selective hidden random fields: Exploiting domain-specific saliency for event classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[33]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[34]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[35]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[36]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[37]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).