UMDFaces: An annotated face dataset for training deep networks

Recent progress in face detection (including keypoint detection), and recognition is mainly being driven by (i) deeper convolutional neural network architectures, and (ii) larger datasets. However, most of the large datasets are maintained by private companies and are not publicly available. The academic computer vision community needs larger and more varied datasets to make further progress. In this paper, we introduce a new face dataset, called UMDFaces, which has 367,888 annotated faces of 8,277 subjects. We also introduce a new face recognition evaluation protocol which will help advance the state-of-the-art in this area. We discuss how a large dataset can be collected and annotated using human annotators and deep networks. We provide human curated bounding boxes for faces. We also provide estimated pose (roll, pitch and yaw), locations of twenty-one key-points and gender information generated by a pre-trained neural network. In addition, the quality of keypoint annotations has been verified by humans for about 115,000 images. Finally, we compare the quality of the dataset with other publicly available face datasets at similar scales.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yuxiao Hu,et al.  MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World , 2016, IMAWM.

[3]  Carlos D. Castillo,et al.  An All-In-One Convolutional Neural Network for Face Analysis , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[4]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[9]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[11]  Ira Kemelmacher-Shlizerman,et al.  MegaFace: A Million Faces for Recognition at Scale , 2015, ArXiv.

[12]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[13]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[14]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[15]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[16]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[17]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Swami Sankaranarayanan,et al.  Triplet Similarity Embedding for Face Verification , 2016, ArXiv.

[19]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[20]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[26]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[28]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[29]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[30]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[35]  Tal Hassner,et al.  Rapid Synthesis of Massive Face Sets for Improved Face Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[36]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[43]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[44]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[45]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[47]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[48]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .