3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction

Caricature is an artistic representation that deliberately exaggerates the distinctive features of a human face to convey humor or sarcasm. However, reconstructing a 3D caricature from a 2D caricature image remains a challenging task, mostly due to the lack of data. We propose to fill this gap by introducing 3DCaricShop, the first large-scale 3D caricature dataset that contains 2000 high-quality diversified 3D caricatures manually crafted by professional artists. 3DCaricShop also provides rich annotations including a paired 2D caricature image, camera parameters and 3D facial landmarks. To demonstrate the advantage of 3DCaricShop, we present a novel baseline approach for single-view 3D caricature reconstruction. To ensure a faithful reconstruction with plausible face deformations, we propose to connect the good ends of the detail-rich implicit functions and the parametric mesh representations. In particular, we first register a template mesh to the output of the implicit generator and iteratively project the registration result onto a pre-trained PCA space to resolve artifacts and self-intersections. To deal with the large deformation during non-rigid registration, we propose a novel view-collaborative graph convolution network (VC-GCN) to extract key points from the implicit mesh for accurate alignment. Our method is able to generate high-fidelity 3D caricature in a pre-defined mesh topology that is animation-ready. Extensive experiments have been conducted on 3DCaricShop to verify the significance of the database and the effectiveness of the proposed method. We will release 3DCaricShop upon publication.

[1]  Wen Gao,et al.  Semi‐Supervised Learning in Reconstructed Manifold Space for 3D Caricature Generation , 2009, Comput. Graph. Forum.

[2]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jianfei Cai,et al.  Alive Caricature from 2D to 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Michael J. Black,et al.  Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Lin Ma,et al.  PFLD: A Practical Facial Landmark Detector , 2019, ArXiv.

[7]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[8]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[9]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[10]  Yinghuan Shi,et al.  WebCaricature: a benchmark for caricature recognition , 2017, BMVC.

[11]  Ron Kimmel,et al.  Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[13]  Feng Liu,et al.  3D Face Modeling From Diverse Raw Scan Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Juyong Zhang,et al.  Landmark Detection and 3D Face Reconstruction for Caricature using a Nonlinear Parametric Model , 2020, ArXiv.

[15]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nick Pears,et al.  Statistical Modeling of Craniofacial Shape and Texture , 2019, International Journal of Computer Vision.

[18]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Olivier D. Faugeras,et al.  Shape From Shading , 2006, Handbook of Mathematical Models in Computer Vision.

[22]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[23]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Anil K. Jain,et al.  WarpGAN: Automatic Caricature Generation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Sami Romdhani,et al.  Optimal Step Nonrigid ICP Algorithms for Surface Registration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Xiaoguang Han,et al.  Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[33]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jing Liao,et al.  CariGANs , 2018, ACM Trans. Graph..

[35]  Yu Qiao,et al.  DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Chongyang Ma,et al.  Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.

[37]  Yizhou Yu,et al.  DeepSketch2Face , 2017, ACM Trans. Graph..

[38]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[39]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[41]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Seeing Through Occlusions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.