Theory and Practice of Hierarchical Data-driven Descent for Optimal Deformation Estimation

Real-world surfaces such as clothing, water and human body deform in complex ways. Estimating deformation parameters accurately and reliably is hard due to its high-dimensional and non-convex nature. Optimization-based approaches require good initialization while regression-based approaches need a large amount of training data. Recently, to achieve globally optimal estimation, data-driven descent (Tian and Narasimhan in Int J Comput Vis , 98:279–302, 2012) applies nearest neighbor estimators trained on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure that first applies nearest neighbor estimators on the entire image iteratively to obtain a rough estimation, and then applies estimators with local image support to refine the estimation. Compared to its non-hierarchical version, our approach has the theoretical guarantees with significantly fewer training samples, is faster by several orders, provides a better metric deciding whether a given image requires more (or fewer) samples, and can handle more complex scenes that include a mixture of global motion and local deformation. We demonstrate in both simulation and real experiments that the proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.

[1]  Yuandong Tian,et al.  Globally Optimal Estimation of Nonrigid Image Distortion , 2012, International Journal of Computer Vision.

[2]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[3]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Daniel Rueckert,et al.  Nonrigid registration using free-form deformations: application to breast MR images , 1999, IEEE Transactions on Medical Imaging.

[6]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[7]  Kiriakos N. Kutulakos,et al.  Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yichen Wei,et al.  Face alignment by Explicit Shape Regression , 2012, CVPR.

[11]  Nassir Navab,et al.  Deformable Template Tracking in 1ms , 2014, BMVC.

[12]  Vincent Lepetit,et al.  Closed-Form Solution to Non-rigid 3D Surface Registration , 2008, ECCV.

[13]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[14]  Luc Van Gool,et al.  Optimal Templates for Nonrigid Surface Reconstruction , 2012, ECCV.

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[20]  Yan Zhou,et al.  Shape Prior Modeling Using Sparse Representation and Online Dictionary Learning , 2012, MICCAI.

[21]  Steven S. Beauchemin,et al.  The computation of optical flow , 1995, CSUR.

[22]  Ricardo Gutierrez-Osuna,et al.  An Iterative Image Registration Technique Using a Scale-Space Model , 2011 .

[23]  Nassir Navab,et al.  Online Learning of Linear Predictors for Real-Time Tracking , 2012, ECCV.

[24]  Adam Finkelstein,et al.  The Generalized PatchMatch Correspondence Algorithm , 2010, ECCV.

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Pascal Fua,et al.  Convex Optimization for Deformable Surface 3-D Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.