On the use of automatically generated synthetic image datasets for benchmarking face recognition

The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN’s training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Anurag Ranjan,et al.  GIF: Generative Interpretable Faces , 2020, 2020 International Conference on 3D Vision (3DV).

[3]  Sami Romdhani,et al.  A 3D GAN for Improved Large-pose Facial Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiao Liu,et al.  STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Viktor Varkarakis,et al.  Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics , 2020, IEEE Access.

[6]  Sébastien Marcel,et al.  Heterogeneous Face Recognition Using Domain Specific Units , 2019, IEEE Transactions on Information Forensics and Security.

[7]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[8]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[9]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[10]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Gang Hua,et al.  Towards Open-Set Identity Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sébastien Marcel,et al.  Face Recognition in Challenging Environments: An Experimental and Reproducible Research Survey , 2016, Face Recognition Across the Imaging Spectrum.

[14]  Chris Donahue,et al.  Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[15]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[17]  Bernhard Egger,et al.  Training Deep Face Recognition Systems with Synthetic Data , 2018, ArXiv.

[18]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Daniel Cohen-Or,et al.  Face identity disentanglement via latent space mapping , 2020, ACM Trans. Graph..

[20]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[21]  Rama Chellappa,et al.  ExprGAN: Facial Expression Editing with Controllable Expression Intensity , 2017, AAAI.

[22]  Sami Romdhani,et al.  Taking Control of Intra-class Variation in Conditional GANs Under Weak Supervision , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[23]  Chang Liu,et al.  Study on Synthetic Face Database for Performance Evaluation , 2006, ICB.

[24]  Jean-Luc Dugelay,et al.  Face aging with conditional generative adversarial networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[25]  Li Meng,et al.  Generating Photo-Realistic Training Data to Improve Face Recognition Accuracy , 2018, Neural Networks.

[26]  Anil K. Jain,et al.  IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[27]  Daniel E. Crispell,et al.  Dataset Augmentation for Pose and Lighting Invariant Face Recognition , 2017, ArXiv.