论文信息 - DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild

DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild

In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks in-the-wild. We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate quantized regression architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials.

[1] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[2] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.

[3] Bernt Schiele,et al. DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[4] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.

[5] Stefanos Zafeiriou,et al. 300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[6] Joachim M. Buhmann,et al. Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[7] S. Mallat. A wavelet tour of signal processing , 1998 .

[8] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Xiangyu Zhu,et al. High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Timothy F. Cootes,et al. Multi-view Constrained Local Models for Large Head Angle Facial Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[11] Luc Van Gool,et al. Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[13] Ben Taskar,et al. Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14] Zhenhua Wang,et al. Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[15] Ke Gong,et al. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] A. Yuille. Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[17] Luc Van Gool,et al. An Elastic Deformation Field Model for Object Detection and Tracking , 2014, International Journal of Computer Vision.

[18] Viorica Patraucean,et al. gvnn: Neural Network Library for Geometric Computer Vision , 2016, ECCV Workshops.

[19] Stefanos Zafeiriou,et al. A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[21] Luc Van Gool,et al. Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Shuicheng Yan,et al. Training Group Orthogonal Neural Networks with Privileged Information , 2017, IJCAI.

[23] Davis E. King. Max-Margin Object Detection , 2015, ArXiv.

[24] Stefanos Zafeiriou,et al. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[25] Thomas S. Huang,et al. Interactive Facial Feature Localization , 2012, ECCV.

[26] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[28] Stefanos Zafeiriou,et al. Estimating Correspondences of Deformable Objects “In-the-Wild” , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Vladimir Vapnik,et al. A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[30] Qingshan Liu,et al. Facial Shape Tracking via Spatio-Temporal Cascade Shape Regression , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[31] Cordelia Schmid,et al. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33] Andrew W. Fitzgibbon,et al. Metric Regression Forests for Correspondence Estimation , 2015, International Journal of Computer Vision.

[34] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Václav Hlavác,et al. Real-time multi-view facial landmark detector learned by the structured output SVM , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[36] Gilbert MURAZ,et al. L 2 英語摩擦音の知覚における高周波数帯域情報の利用 , 2012 .

[37] Andrew W. Fitzgibbon,et al. The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Christian Wolf,et al. Hand Pose Estimation through Weakly-Supervised Learning of a Rich Intermediate Representation , 2015, ArXiv.

[39] ČechJan,et al. Multi-view facial landmark detection by using a 3D shape model , 2016 .

[40] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[41] Xiaoming Liu,et al. Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[43] Peter V. Gehler,et al. Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Katsushi Ikeuchi,et al. Articulated Pose Estimation , 2014, Computer Vision, A Reference Guide.

[45] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Jiří Matas,et al. Multi-view facial landmark detection by using a 3D shape model , 2016, Image Vis. Comput..

[47] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[48] Georgios Tzimiropoulos,et al. Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[49] Sudeep Sarkar,et al. Learning Camera Viewpoint Using CNN to Improve 3D Body Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[50] Qingshan Liu,et al. M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression , 2016, Image Vis. Comput..

[51] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[52] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Erik G. Learned-Miller,et al. Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[56] Ashraf A. Kassim,et al. Facial Landmark Detection via Progressive Initialization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[57] Maja Pantic,et al. Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[59] Gang Hua,et al. Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[60] Jonathan Masci,et al. Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[61] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[62] Andrea Vedaldi,et al. Unsupervised object learning from dense equivariant image labelling , 2017, NIPS 2017.

[63] David J. Kriegman,et al. Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65] S. Tsogkas,et al. Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[66] Luis E. Ortiz,et al. Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67] Cordelia Schmid,et al. Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Luc Van Gool,et al. Face Detection without Bells and Whistles , 2014, ECCV.

[69] Stefanos Zafeiriou,et al. Offline Deformable Face Tracking in Arbitrary Videos , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[70] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[71] Pietro Perona,et al. Benchmarking and Error Diagnosis in Multi-instance Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72] Stefanos Zafeiriou,et al. The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[73] Sami Romdhani,et al. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75] George Trigeorgis,et al. 3D Face Morphable Models "In-the-Wild" , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[77] Qi-Xing Huang,et al. Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78] B. S. Manjunath,et al. Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[79] Haoqiang Fan,et al. Approaching human level facial landmark localization by deep learning , 2016, Image Vis. Comput..

[80] Xiangyu Zhu,et al. Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81] Simon Lucey,et al. Dense Semantic Correspondence Where Every Pixel is a Classifier , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[82] Bernt Schiele,et al. Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[83] Pierre Vandergheynst,et al. Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[84] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[85] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[88] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[90] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[91] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92] Iasonas Kokkinos,et al. DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95] Peter V. Gehler,et al. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Stefanos Zafeiriou,et al. Optimal UV spaces for facial morphable model construction , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[97] Iasonas Kokkinos,et al. Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98] Sanja Fidler,et al. Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[99] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[100] Yiying Tong,et al. FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[101] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[102] PanticMaja,et al. 300 Faces In-The-Wild Challenge , 2016 .

[103] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[104] Sami Romdhani,et al. A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[105] Quan Pan,et al. Multi-band Polarization Imaging and Applications , 2016, Advances in Computer Vision and Pattern Recognition.

[106] Iasonas Kokkinos,et al. Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[107] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[108] Qiang Ji,et al. Shape Augmented Regression Method for Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[109] Michel F. Valstar,et al. L2, 1-based regression and prediction accumulation across views for robust facial landmark detection , 2016, Image Vis. Comput..

[110] Petros Maragos,et al. Adaptive and constrained algorithms for inverse compositional Active Appearance Model fitting , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[111] George Trigeorgis,et al. Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[113] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[114] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[115] Georgios Tzimiropoulos,et al. Two-Stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge , 2016, ECCV Workshops.

[116] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[117] Bernt Schiele,et al. Learning people detection models from few training samples , 2011, CVPR 2011.

[118] Václav Hlavác,et al. Facial Landmark Tracking by Tree-Based Deformable Part Model Based Detector , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[119] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[120] Ulf Grenander,et al. Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[121] Alexei A. Efros,et al. Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122] Stefanos Zafeiriou,et al. Feature-Based Lucas–Kanade and Active Appearance Models , 2015, IEEE Transactions on Image Processing.