论文信息 - Theory and Practice of Globally Optimal Deformation Estimation

Theory and Practice of Globally Optimal Deformation Estimation

Nonrigid deformation modeling and estimation from images is a technically challenging task due to its nonlinear, nonconvex and high-dimensional nature. Traditional optimization procedures often rely on good initializations and give locally optimal solutions. On the other hand, learning-based methods that directly model the relationship between deformed images and their parameters either cannot handle complicated forms of mapping, or suffer from the Nyquist Limit and the curse of dimensionality due to high degrees of freedom in the deformation space. In particular, to achieve a worst-case guarantee of ǫ error for a deformation with d degrees of freedom, the sample complexity required is O(1/ǫ). In this thesis, a generative model for deformation is established and analyzed using a unified theoretical framework. Based on the framework, three algorithms, Data-Driven Descent, Top-down and Bottom-up Hierarchical Models, are designed and constructed to solve the generative model. Under Lipschitz conditions that rule out unsolvable cases (e.g., deformation of a blank image), all algorithms achieve globally optimal solutions to the specific generative model. The sample complexity of these methods is substantially lower than that of learning-based approaches, which are agnostic to deformation modeling. To achieve global optimality guarantees with lower sample complexity, the structure embedded in the deformation model is exploited. In particular, Data-driven Descent relates two deformed images that are far away in the parameter space by compositional structures of deformation and reduce the sample complexity toO(C log 1/ǫ). Top-down Hierarchical Model factorizes the local deformation into patches once the global deformation has been estimated approximately and further reduce the sample complexity toO(C 1 +C2 log 1/ǫ). Finally, the Bottom-up Hierarchical Model builds representations that are invariant to local deformation. With the representations, the global deformation can be estimated independently of local deformation, reducing the sample complexity to O( ( C ǫ )0) (d0 ≪ d). From the analysis, this thesis shows the connections between approaches that are traditionally considered to be of very different nature. New theoretical conjectures on approaches like Deep Learning, are also provided. In practice, broad applications of the proposed approaches have also been demonstrated to estimate water distortion, air turbulence, cloth deformation and human pose with state-of-the-art results. Some approaches even achieve near real-time performance. Finally, application-dependent physics-based models are built with good performance in document rectification and scene depth recovery in turbulent media. September 18, 2013 DRAFT

Yuandong Tian | Yuandong Tian

[1] Rama Chellappa,et al. A Method for Enforcing Integrability in Shape from Shading Algorithms , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Björn Stenger,et al. Pose estimation and tracking using multivariate regression , 2008, Pattern Recognit. Lett..

[3] M. Pilu. Deskewing Perspectively Distorted Documents : An Approach Based on Perceptual Organization , 2001 .

[4] Norman S. Kopeika,et al. A System Engineering Approach to Imaging , 1998 .

[5] Gabor T. Herman,et al. Image Reconstruction From Projections , 1975, Real Time Imaging.

[6] Alexei A. Efros,et al. Seeing through water , 2004, NIPS.

[7] Chew Lim Tan,et al. Restoration of curved document images through 3D shape modeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8] Yuandong Tian,et al. Seeing through water: Image restoration using model-based tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Xiang Zhu,et al. Stabilizing and deblurring atmospheric turbulence , 2011, 2011 IEEE International Conference on Computational Photography (ICCP).

[10] Tristan Dagobert,et al. Atmospheric Turbulence Restoration by Diffeomorphic Image Registration and Blind Deconvolution , 2008, ACIVS.

[11] Rajesh P. N. Rao,et al. Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[12] Vincent Lepetit,et al. Closed-Form Solution to Non-rigid 3D Surface Registration , 2008, ECCV.

[13] Chew Lim Tan,et al. Warped image restoration with applications to digital libraries , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14] Gordon Wetzstein,et al. Hand-held Schlieren Photography with Light Field probes , 2011, 2011 IEEE International Conference on Computational Photography (ICCP).

[15] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[16] Raquel Urtasun,et al. Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Seiichi Uchida,et al. Dewarping of document image by global optimization , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[18] Shahriar Negahdaripour,et al. Stereo from flickering caustics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[20] Simon Baker,et al. Equivalence and efficiency of image alignment algorithms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21] Jian Sun,et al. Face Alignment by Explicit Shape Regression , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Fatih Murat Porikli,et al. Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Silvio Savarese,et al. Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[24] Cristian Sminchisescu,et al. BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] A. Kolmogorov. The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers , 1991, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[26] Pascal Fua,et al. Simultaneous point matching and 3D deformable surface reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28] Changsong Liu,et al. Rectifying the bound document image captured by the camera: a model based approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[29] David S. Doermann,et al. Geometric Rectification of Camera-Captured Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Stan Sclaroff,et al. Active blobs , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[31] Stefano Soatto,et al. Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Christoph Schnörr,et al. A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[33] Masatoshi Okutomi,et al. Super-resolution from image sequence under influence of hot-air optical turbulence , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Hans-Peter Seidel,et al. Time-resolved 3d capture of non-stationary gas flows , 2008, SIGGRAPH Asia '08.

[35] Deva Ramanan,et al. Learning to parse images of articulated bodies , 2006, NIPS.

[36] Antonio Criminisi,et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[37] Kiriakos N. Kutulakos,et al. On the Art of Modeling , 1967 .

[38] Alexei A. Efros,et al. Discovering Texture Regularity as a Higher-Order Correspondence Problem , 2006, ECCV.

[39] Kiriakos N. Kutulakos,et al. Shape from Planar Curves: A Linear Escape from Flatland , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Alan L. Yuille,et al. A depth recovery algorithm using defocus information , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41] Hongdong Li,et al. Reconstruction of Underwater Image by Bispectrum , 2007, 2007 IEEE International Conference on Image Processing.

[42] A. Kolmogorov. Dissipation of energy in the locally isotropic turbulence , 1941, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[43] Thomas Serre,et al. Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44] Nikos Paragios,et al. Dense non-rigid surface registration using high-order graph matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45] Fernando De la Torre,et al. Local minima free Parameterized Appearance Models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Ali Zandifar. Unwarping scanned image of Japanese/English documents , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[47] Hui Ma,et al. Image Deblurring with Blurred / Noisy Image Pairs , 2013 .

[48] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49] Philip H. S. Torr,et al. Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Michel Dhome,et al. Hyperplane Approximation for Template Matching , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[51] Shree K. Nayar,et al. Depth from Diffusion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[53] Simon Baker,et al. Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[54] Alan Fern,et al. Improved Video Registration using Non-Distinctive Local Image Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Stan Sclaroff,et al. Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[57] A. U.S.,et al. Recovering Surface Shape and Orientation from Texture , 2002 .

[58] David A. Forsyth,et al. Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[59] Andrew Zisserman,et al. Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Ankur Agarwal,et al. Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] R. A. Silverman,et al. Wave Propagation in a Turbulent Medium , 1961 .

[62] Nam Ik Cho,et al. State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction , 2010, ECCV.

[63] Yuandong Tian,et al. Globally Optimal Estimation of Nonrigid Image Distortion , 2012, International Journal of Computer Vision.

[64] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[65] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[66] Gregory D. Hager,et al. Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[67] Yanxi Liu,et al. A computational model for periodic pattern perception based on frieze and wallpaper groups , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Julio Soria,et al. Non-Intrusive measurement of a density field using the Background Oriented Schlieren (BOS) method , 2005 .

[69] Jürgen Kompenhans,et al. Demonstration of the applicability of a Background Oriented Schlieren (BOS) method , 2002 .

[70] Eraldo Ribeiro,et al. Classification of Textures Distorted by WaterWaves , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[71] Pascal Fua,et al. Convex Optimization for Deformable Surface 3-D Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[72] Richard Szeliski,et al. Systems and Experiment Paper: Construction of Panoramic Image Mosaics with Global and Local Alignment , 2000, International Journal of Computer Vision.

[73] Daniel Rueckert,et al. Nonrigid registration using free-form deformations: application to breast MR images , 1999, IEEE Transactions on Medical Imaging.

[74] Rómer Rosales,et al. Learning Body Pose via Specialized Maps , 2001, NIPS.

[75] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[76] Simon Baker,et al. Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[77] Atsushi Yamashita,et al. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[78] Pascal Fua,et al. Reconstructing sharply folding surfaces: A convex formulation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[79] Yuandong Tian,et al. A globally optimal data-driven approach for image distortion estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80] Eraldo Ribeiro,et al. Improved reconstruction of images distorted by water waves , 2006, VISAPP.

[81] D. Fried. Probability of getting a lucky short-exposure image through turbulence* , 1978 .

[82] Michael J. Black,et al. Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[83] Shree K. Nayar,et al. Vision and the Atmosphere , 2002, International Journal of Computer Vision.

[84] Kiriakos N. Kutulakos,et al. Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[85] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[86] Akira Ishimaru,et al. Wave propagation and scattering in random media , 1997 .

[87] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.

[88] W. Brent Seales,et al. Image restoration of arbitrarily warped documents , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89] Jitendra Malik,et al. Computing Local Surface Orientation and Shape from Texture for Curved Surfaces , 1997, International Journal of Computer Vision.

[90] Edward H. Adelson,et al. Single Lens Stereo with a Plenoptic Camera , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[91] G. Meier,et al. Density measurements using the Background Oriented Schlieren technique , 2004 .

[92] Yang Wang,et al. Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[93] G. Settles. Schlieren and shadowgraph techniques , 2001 .

[94] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[95] Michael Gleicher,et al. Projective registration with difference decomposition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[96] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[97] W. Brent Seales,et al. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[98] Václav Hlavác,et al. Efficient MRF Deformation Model for Non-Rigid Image Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[99] Yang Wang,et al. Enforcing convexity for improved alignment with constrained local models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[100] Bernhard Schölkopf,et al. Efficient filter flow for space-variant multiframe blind deconvolution , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[101] Fred L. Bookstein,et al. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[102] Stan Z. Li,et al. Direct appearance models , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[103] Christoph H. Lampert,et al. Document capture using stereo vision , 2004, DocEng '04.

[104] Gady Agam,et al. Document Image De-warping for Text/Graphics Recognition , 2002, SSPR/SPR.

[105] Yan Zhou,et al. Collaborative Tracking for MRI-Guided Robotic Intervention on the Beating Heart , 2010, MICCAI.

[106] Yanxi Liu,et al. Deformed Lattice Detection in Real-World Images Using Mean-Shift Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107] D. Marr,et al. Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[108] D. Doermann,et al. Unwarping Images of Curved Documents Using Global Shape Optimization , 2005 .

[109] Daniel P. Huttenlocher,et al. Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[110] David A. Forsyth,et al. Shape from texture and integrability , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[111] B. Welsh,et al. Imaging Through Turbulence , 1996 .

[112] Thomas S. Huang,et al. Discriminative estimation of 3D human pose using Gaussian processes , 2008, 2008 19th International Conference on Pattern Recognition.