Teacher-Student Adversarial Depth Hallucination to Improve Face Recognition

We present the Teacher-Student Generative Adversarial Network (TS-GAN) to generate depth images from single RGB images in order to boost the performance of face recognition systems. For our method to generalize well across unseen datasets, we design two components in the architecture, a teacher and a student. The teacher, which itself consists of a generator and a discriminator, learns a latent mapping between input RGB and paired depth images in a supervised fashion. The student, which consists of two generators (one shared with the teacher) and a discriminator, learns from new RGB data with no available paired depth information, for improved generalization. The fully trained shared generator can then be used in runtime to hallucinate depth from RGB for downstream applications such as face recognition. We perform rigorous experiments to show the superiority of TS-GAN over other methods in generating synthetic depth images. Moreover, face recognition experiments demonstrate that our hallucinated depth along with the input RGB images boost performance across various architectures when compared to a single RGB modality by average values of +1.2%, +2.6%, and +2.6% for IIITD, EURECOM, and LFW datasets respectively. We make our implementation public at: https://github.com/hardikuppal/teacher-student-gan.git.

[1]  Jan Kautz,et al.  Bi3D: Stereo Depth Estimation via Binary Classifications , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yang Yang,et al.  Effective 3D face depth estimation from a single 2D face image , 2016, 2016 16th International Symposium on Communications and Information Technologies (ISCIT).

[3]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ajmal S. Mian,et al.  Face recognition based on Kinect , 2015, Pattern Analysis and Applications.

[5]  Rita Cucchiara,et al.  Learning to Generate Facial Depth Maps , 2018, 2018 International Conference on 3D Vision (3DV).

[6]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Dong-Hoon Kwak,et al.  A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation , 2020, Sensors.

[9]  Xuming He,et al.  Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Jean-Marie Morvan,et al.  Improving Shadow Suppression for Illumination Robust Face Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ali Etemad,et al.  Depth as Attention for Face Representation Learning , 2021, IEEE Transactions on Information Forensics and Security.

[15]  Xuming He,et al.  Indoor scene structure analysis for single image depth estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[18]  Ching-Te Chiu,et al.  Rgb-D Based Multi-Modal Deep Learning for Face Identification , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Mostafa Mehdipour-Ghazi,et al.  A Comprehensive Analysis of Deep Learning Based Representation for Face Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Samarth Bharadwaj,et al.  On RGB-D face recognition using Kinect , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[21]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shiguang Shan,et al.  RGB-D Face Recognition via Deep Complementary and Common Feature Learning , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[23]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[24]  Richa Singh,et al.  RGB-D face recognition via learning-based reconstruction , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[25]  Carlos D. Castillo,et al.  Frontal to profile face verification in the wild , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[27]  Shimon Ullman,et al.  Face Recognition: The Problem of Compensating for Changes in Illumination Direction , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Richa Singh,et al.  RGB-D Face Recognition With Texture and Attribute Features , 2014, IEEE Transactions on Information Forensics and Security.

[35]  Jean-Luc Dugelay,et al.  KinectFaceDB: A Kinect Database for Face Recognition , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[36]  Kin-Man Lam,et al.  Depth Estimation of Face Images Based on the Constrained ICA Model , 2010, IEEE Transactions on Information Forensics and Security.

[37]  Ali Etemad,et al.  Two-Level Attention-based Fusion Learning for RGB-D Face Recognition , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[38]  Shiguang Shan,et al.  Improving 2D Face Recognition via Discriminative Face Depth Estimation , 2018, 2018 International Conference on Biometrics (ICB).

[39]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Shaojie Shen,et al.  MVDepthNet: Real-Time Multiview Depth Estimation Neural Network , 2018, 2018 International Conference on 3D Vision (3DV).

[41]  Alan L. Yuille,et al.  Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students , 2018, ArXiv.

[42]  Paulo Lobato Correia,et al.  LIGHT FIELD BASED FACE RECOGNITION VIA A FUSED DEEP REPRESENTATION , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[43]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bailin Deng,et al.  Robust RGB-D Face Recognition Using Attribute-Aware Loss , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[46]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[47]  Kin-Man Lam,et al.  Depth Estimation of Face Images Using the Nonlinear Least-Squares Model , 2013, IEEE Transactions on Image Processing.

[48]  Mohammed Bennamoun,et al.  An RGB-D based image set classification for robust face recognition from Kinect data , 2016, Neurocomputing.

[49]  Ajmal S. Mian,et al.  Using Kinect for face recognition under varying poses, expressions, illumination and disguise , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[50]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[51]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).