Supervised descriptor learning for multi-output regression

Descriptor learning has recently drawn increasing attention in computer vision, Existing algorithms are mainly developed for classification rather than for regression which however has recently emerged as a powerful tool to solve a broad range of problems, e.g., head pose estimation. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm to establish a discriminative and compact feature representation for multi-output regression. By formulating as generalized low-rank approximations of matrices with a supervised manifold regularization (SMR), the SDL removes irrelevant and redundant information from raw features by transforming into a low-dimensional space under the supervision of multivariate targets. The obtained discriminative while compact descriptor largely reduces the variability and ambiguity in multi-output regression, and therefore enables more accurate and efficient multivariate estimation. We demonstrate the effectiveness of the proposed SDL algorithm on a representative multi-output regression task: head pose estimation using the benchmark Pointing'04 dataset. Experimental results show that the SDL can achieve high pose estimation accuracy and significantly outperforms state-of-the-art algorithms by an error reduction up to 27.5%. The proposed SDL algorithm provides a general descriptor learning framework in a supervised way for multi-output regression which can largely boost the performance of existing multi-output regression tasks.

[1]  Bodo Rosenhahn,et al.  Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xiantong Zhen,et al.  Direct Estimation of Cardiac Bi-ventricular Volumes with Regression Forests , 2014, MICCAI.

[3]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Matti Pietikäinen,et al.  Computer Vision Using Local Binary Patterns , 2011, Computational Imaging and Vision.

[5]  Larry S. Davis,et al.  On partial least squares in head pose estimation: How to simultaneously deal with misalignment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  Jieping Ye,et al.  GPCA: an efficient dimension reduction scheme for image compression and retrieval , 2004, KDD.

[9]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[10]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  Ling Shao,et al.  Discriminative Embedding via Image-to-Class Distances , 2014, BMVC.

[13]  Stefanos Zafeiriou,et al.  Subspace Learning from Image Gradient Orientations , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Subhransu Maji,et al.  Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection , 2014, ECCV.

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Hal Daumé,et al.  Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression , 2012, NIPS.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Zhenyue Zhang,et al.  Low-Rank Matrix Approximation with Manifold Regularization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yong Yu,et al.  Multi-output regression on the output manifold , 2009, Pattern Recognit..

[22]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[23]  Hongbin Zha,et al.  Supervised Kernel Descriptors for Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[26]  Matti Pietikäinen,et al.  Learning Discriminant Face Descriptor , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[28]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Kyung-Ah Sohn,et al.  Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization , 2012, AISTATS.

[30]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[31]  Matti Pietikäinen,et al.  Discriminative features for texture description , 2012, Pattern Recognit..

[32]  Terry M. Peters,et al.  Regional Assessment of Cardiac Left Ventricular Myocardial Function via MRI Statistical Features , 2014, IEEE Transactions on Medical Imaging.

[33]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Paul Mineiro,et al.  Discriminative Features via Generalized Eigenvectors , 2013, ICML.

[35]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[36]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Chiraz BenAbdelkader Robust Head Pose Estimation Using Supervised Manifold Learning , 2010, ECCV.

[38]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2004, Machine Learning.