Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach

We investigate the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image. A successful approach to alleviating the reconstruction ambiguity is the 3D deformable shape model and a sparse representation is often used to capture complex shape variability. But the model inference is still challenging due to the nonconvexity in the joint optimization of shape and viewpoint. In contrast to prior work that relies on an alternating scheme whose solution depends on initialization, we propose a convex approach to addressing this challenge and develop an efficient algorithm to solve the proposed convex program. We further propose a robust model to handle gross errors in the 2D correspondences. We demonstrate the exact recovery property of the proposed method, the advantage compared to several nonconvex baselines and the applicability to recover 3D human poses and car models from single images.

[1]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[2]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[3]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jing Xiao,et al.  A Closed-Form Solution to Non-Rigid Shape and Motion Recovery , 2004, International Journal of Computer Vision.

[6]  Jitendra Malik,et al.  Virtual view networks for object reconstruction , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alessio Del Bue,et al.  Optimal Metric Projections for Deformable and Articulated Structure-from-Motion , 2011, International Journal of Computer Vision.

[8]  Jing Xiao,et al.  A Closed-Form Solution to Non-rigid Shape and Motion Recovery , 2004, ECCV.

[9]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[11]  Larry S. Davis,et al.  Jointly Optimizing 3D Model Fitting and Fine-Grained Classification , 2014, ECCV.

[12]  Vladlen Koltun,et al.  Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[13]  Takeo Kanade,et al.  3D Alignment of Face in a Single Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[18]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[19]  Simon Lucey,et al.  Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[21]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[22]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[23]  Junzhou Huang,et al.  Sparse shape composition: A new framework for shape prior modeling , 2011, CVPR 2011.

[24]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jing Xiao,et al.  Real-time combined 2D+3D active appearance models , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Leonidas J. Guibas,et al.  Estimating image depth using shape collections , 2014, ACM Trans. Graph..

[27]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[29]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Li Zhang,et al.  Model evolution: An incremental approach to non-rigid structure from motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[33]  Fernando De la Torre,et al.  Spatio-temporal Matching for Human Detection in Video , 2014, ECCV.

[34]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[35]  Xiaowei Zhou,et al.  Articulated motion estimation from a monocular image sequence using spherical tangent bundles , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[37]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[38]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[39]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[40]  Youjie Zhou,et al.  Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[41]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[42]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[43]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[44]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[46]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Lourdes Agapito,et al.  Good Vibrations: A Modal Analysis Approach for Sequential Non-rigid Structure from Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[49]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[51]  Francesc Moreno-Noguer,et al.  A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Leonidas J. Guibas,et al.  Functional map networks for analyzing and exploring large shape collections , 2014, ACM Trans. Graph..

[53]  Michael J. Black,et al.  Predicting 3 D People from 2 D Pictures , .

[54]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[56]  Takeo Kanade,et al.  Real-time combined 2D+3D active appearance models , 2004, CVPR 2004.

[57]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[59]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[60]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[61]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..