In Defense of Gradient-Based Alignment on Densely Sampled Sparse Features

In this chapter, we explore the surprising result that gradient-based continuous optimization methods perform well for the alignment of image/object models when using densely sampled sparse features (HOG, dense SIFT, etc.). Gradient-based approaches for image/object alignment have many desirable properties—inference is typically fast and exact, and diverse constraints can be imposed on the motion of points. However, the presumption that gradients predicted on sparse features would be poor estimators of the true descent direction has meant that gradient-based optimization is often overlooked in favor of graph-based optimization. We show that this intuition is only partly true: sparse features are indeed poor predictors of the error surface, but this has no impact on the actual alignment performance. In fact, for general object categories that exhibit large geometric and appearance variation, sparse features are integral to achieving any convergence whatsoever. How the descent directions are predicted becomes an important consideration for these descriptors. We explore a number of strategies for estimating gradients, and show that estimating gradients via regression in a manner that explicitly handles outliers improves alignment performance substantially. To illustrate the general applicability of gradient-based methods to the alignment of challenging object categories, we perform unsupervised ensemble alignment on a series of nonrigid animal classes from ImageNet.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[3]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  Jiri Matas,et al.  Tracking by an Optimal Sequence of Linear Predictors , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sridha Sridharan,et al.  Least-squares congealing for large numbers of images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Pedro F. Felzenszwalb Representation and detection of deformable shapes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Shai Avidan,et al.  Support Vector Tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  A. Yuille Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[14]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[16]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[17]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[18]  Jitendra Malik,et al.  Robust computation of optical flow in a multi-scale differential framework , 2005, International Journal of Computer Vision.

[19]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Mads Nielsen,et al.  TV-L1 Optical Flow for Vector Valued Images , 2011, EMMCVPR.

[22]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[23]  Jean-Luc Starck,et al.  Compressed sensing reconstruction of convolved sparse signals , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[26]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[27]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Richard Bowden,et al.  Mutual Information for Lucas-Kanade Tracking (MILK): An Inverse Compositional Formulation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.