Feature extraction on faces : from landmark localization to depth estimation

This thesis focuses on learning algorithms that extract important features from faces. The features of main interest are landmarks; the two dimensional (2D) or three dimensional (3D) locations of important facial features such as eye centers, nose tip, and mouth corners. Landmarks are used to solve complex tasks that cannot be solved directly or require guidance for enhanced performance, such as pose or gesture recognition, tracking, or face verification. The application of the models presented in this thesis is on facial images; however, the algorithms proposed are more general and can be applied to the landmarks of other forms of objects, such as hands, full body or man-made objects. This thesis is written by article and explores different techniques to solve various aspects of landmark localization. In the first article, we disentangle identity and expression of a given face to learn a prior distribution over the joint set of landmarks. This prior is then merged with a discriminative classifier that learns an independent probability distribution per landmark. The merged model is capable of explaining differences in expressions for the same identity representation. In the second article, we propose an architecture that aims at uncovering image features to do tasks that require high pixel-level accuracy, such as landmark localization or image segmentation. The proposed architecture gradually extracts coarser features in its encoding steps to get more global information over the image and then it expands the coarse features back to the image resolution by recombining the features of the encoding path. The model, termed Recombinator Networks, obtained state-of-the-art on several datasets, while also speeding up training. In the third article, we aim at improving landmark localization when only a few images with labelled landmarks are available. In particular, we leverage a weaker form of data labels that are easier to acquire or more abundantly available such as emotion or head pose. To do so, we propose an architecture to backpropagate gradients of the weaker labels through landmarks, effectively training the landmark localization network. We also propose an unsupervised loss component

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Takeo Kanade,et al.  Dense 3D face alignment from 2D videos in real-time , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[5]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[8]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[9]  Graham W. Taylor,et al.  Multi-task Learning of Facial Landmarks and Expression , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[10]  Qiang Ji,et al.  Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[13]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[14]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[18]  Ralph Gross,et al.  Generic vs. person specific active appearance models , 2005, Image Vis. Comput..

[19]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[22]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Michael J. Jones,et al.  Fully automatic pose-invariant face recognition via 3D pose normalization , 2011, 2011 International Conference on Computer Vision.

[24]  Junzhou Huang,et al.  Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Xiaogang Wang,et al.  FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Hanjiang Lai,et al.  Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[27]  Ran He,et al.  Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[29]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[31]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[34]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[35]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Shiguang Shan,et al.  Locality-Constrained Active Appearance Model , 2012, ACCV.

[38]  Fang Zhao,et al.  Towards Pose Invariant Face Recognition in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Katerina Fragkiadaki,et al.  Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Peter Robinson,et al.  Continuous Conditional Neural Fields for Structured Regression , 2014, ECCV.

[41]  Joshua B. Tenenbaum,et al.  Learning bilinear models for two-factor problems in vision , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[45]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[46]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[47]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[49]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[50]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Nicu Sebe,et al.  Recurrent Convolutional Face Alignment , 2016, ACCV.

[52]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[53]  Tal Hassner,et al.  Viewing Real-World Faces in 3D , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[55]  Fang Zhao,et al.  Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis , 2017, NIPS.

[56]  Sina Honari,et al.  Improving Landmark Localization with Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[58]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[59]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[61]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[62]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[65]  Hans-Peter Seidel,et al.  Exchanging Faces in Images , 2004, Comput. Graph. Forum.

[66]  Sami Romdhani,et al.  Face identification across different poses and illuminations with a 3D morphable model , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[67]  Wen Gao,et al.  Efficient 3D reconstruction for face recognition , 2005, Pattern Recognit..

[68]  Li Zhang,et al.  Collaborative Facial Landmark Localization for Transferring Annotations Across Datasets , 2014, ECCV.

[69]  Raphaël Féraud,et al.  A Fast and Accurate Face Detector Based on Neural Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[71]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Sina Honari,et al.  Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[75]  Andrea Vedaldi,et al.  Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[76]  D. Hubel,et al.  Receptive fields of optic nerve fibres in the spider monkey , 1960, The Journal of physiology.

[77]  Ioannis Patras,et al.  Sieving Regression Forest Votes for Facial Feature Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[78]  Feng Zhou,et al.  Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[79]  Xueyin Lin,et al.  Facial expressional image synthesis controlled by emotional parameters , 2005, Pattern Recognit. Lett..

[80]  D. Hubel,et al.  Cortical and callosal connections concerned with the vertical meridian of visual fields in the cat. , 1967, Journal of neurophysiology.

[81]  Otmar Hilliges,et al.  Learning to find eye region landmarks for remote gaze estimation in unconstrained settings , 2018, ETRA.

[82]  Qiang Ji,et al.  A joint cascaded framework for simultaneous eye detection and eye state estimation , 2017, Pattern Recognit..

[83]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[85]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[86]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[87]  Luc Van Gool,et al.  Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[89]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[91]  Toby P. Breckon,et al.  Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[92]  Cheng Cheng,et al.  A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[97]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[99]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[100]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[101]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[102]  Xueyin Lin,et al.  Emotional facial expression model building , 2003, Pattern Recognition Letters.

[103]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[104]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[105]  Franziska Hoffmann,et al.  Spatial Tessellations Concepts And Applications Of Voronoi Diagrams , 2016 .

[106]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[108]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[109]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[110]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[111]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[112]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[113]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[115]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[116]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[117]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[118]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[119]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[120]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[121]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[122]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[123]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[124]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[125]  Shaun J. Canavan,et al.  Hand Pointing Estimation for Human Computer Interaction Based on Two Orthogonal-Views , 2010, 2010 20th International Conference on Pattern Recognition.

[126]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[127]  Sergio Escalera,et al.  Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[128]  Jakub Nalepa,et al.  Self-adaptive algorithm for segmenting skin regions , 2014, EURASIP J. Adv. Signal Process..

[129]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[130]  Karthik Ramani,et al.  DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[131]  Timothy F. Cootes,et al.  Boosted Regression Active Shape Models , 2007, BMVC.

[132]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[133]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[135]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[136]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[137]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[138]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[139]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[140]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[141]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[142]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[143]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[144]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[145]  Jakub Nalepa,et al.  Real-Time Hand Shape Classification , 2014, ArXiv.

[146]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[147]  Christopher Joseph Pal,et al.  Localizing Facial Keypoints with Global Descriptor Search, Neighbour Alignment and Locally Linear Models , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[148]  Ira Kemelmacher-Shlizerman,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Face Reconstruction from a Single Image Using a Single Reference Face Shape , 2022 .

[149]  Jonathan T. Barron,et al.  Aperture Supervision for Monocular Depth Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[150]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[151]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[152]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[153]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[155]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[156]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[157]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[158]  Zhengyou Zhang,et al.  Improving multiview face detection with multi-task deep convolutional neural networks , 2014, IEEE Winter Conference on Applications of Computer Vision.

[159]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[160]  Timothy F. Cootes,et al.  Statistical models of face images - improving specificity , 1998, Image Vis. Comput..

[161]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[162]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[163]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[164]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[165]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[166]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[167]  Li Shen,et al.  Unified model in identity subspace for face recognition , 2004, Journal of Computer Science and Technology.

[168]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[169]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[170]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[171]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[172]  Mark Sandler,et al.  CycleGAN, a Master of Steganography , 2017, ArXiv.

[173]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[174]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[175]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[176]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[177]  Umar Mohammed,et al.  Probabilistic Models for Inference about Identity , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[178]  András Lörincz,et al.  3D shape estimation in video sequences provides high precision evaluation of facial expressions , 2012, Image Vis. Comput..

[179]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[180]  David Cristinacce,et al.  Automatic feature localisation with constrained local models , 2008, Pattern Recognit..

[181]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[182]  Xiaoming Liu,et al.  Coefficients Pose-Variant Input Recogni 8 on Engine Frontalized Output Generator FF-GAN D Discriminator Extreme Pose Input Frontalized Output , 2017 .

[183]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[184]  Ankush Gupta,et al.  Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[185]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[186]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[187]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[188]  Dragos Datcu,et al.  Free-hands interaction in augmented reality , 2013, SUI '13.

[189]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[190]  Emil M. Petriu,et al.  Hand gesture recognition using Bag-of-features and multi-class Support Vector Machine , 2010, 2010 IEEE International Symposium on Haptic Audio Visual Environments and Games.

[191]  Sina Honari,et al.  Distribution Matching Losses Can Hallucinate Features in Medical Image Translation , 2018, MICCAI.