Depth as Attention for Face Representation Learning

Face representation learning solutions have recently achieved great success for various applications such as verification and identification. However, face recognition approaches that are based purely on RGB images rely solely on intensity information, and therefore are more sensitive to facial variations, notably pose, occlusions, and environmental changes such as illumination and background. A novel depth-guided attention mechanism is proposed for deep multi-modal face recognition using low-cost RGB-D sensors. Our novel attention mechanism directs the deep network “where to look” for visual features in the RGB image by focusing the attention of the network using depth features extracted by a Convolution Neural Network (CNN). The depth features help the network focus on regions of the face in the RGB image that contain more prominent person-specific information. Our attention mechanism then uses this correlation to generate an attention map for RGB images from the depth features extracted by the CNN. We test our network on four public datasets, showing that the features obtained by our proposed solution yield better results on the Lock3DFace, CurtinFaces, IIIT-D RGB-D, and KaspAROV datasets which include challenging variations in pose, occlusion, illumination, expression, and time lapse. Our solution achieves average (increased) accuracies of 87.3% (+5.0%), 99.1% (+0.9%), 99.7% (+0.6%) and 95.3%(+0.5%) for the four datasets respectively, thereby improving the state-of-the-art. We also perform additional experiments with thermal images, instead of depth images, showing the high generalization ability of our solution when adopting other modalities for guiding the attention mechanism instead of depth information.

[1]  R I Hg,et al.  An RGB-D Database Using Microsoft's Kinect for Windows for Face Detection , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[2]  Ling Shao,et al.  RGB-D datasets using microsoft kinect or similar sensors: a survey , 2017, Multimedia Tools and Applications.

[3]  Paulo Lobato Correia,et al.  Face Recognition: A Novel Multi-Level Taxonomy based Survey , 2019, IET Biom..

[4]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[5]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[6]  Shiguang Shan,et al.  RGB-D Face Recognition via Deep Complementary and Common Feature Learning , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[7]  Samarth Bharadwaj,et al.  On RGB-D face recognition using Kinect , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[8]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[9]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[10]  Bailin Deng,et al.  Robust RGB-D Face Recognition Using Attribute-Aware Loss , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[12]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[13]  Fang Zhao,et al.  Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis , 2017, NIPS.

[14]  Wei Zhang,et al.  Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction , 2018, EMNLP.

[15]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[16]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[17]  Arun Ross,et al.  A Comprehensive Overview of Biometric Fusion , 2019, Inf. Fusion.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ching-Te Chiu,et al.  Rgb-D Based Multi-Modal Deep Learning for Face Identification , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[22]  Rohan Ramanath,et al.  An Attentive Survey of Attention Models , 2019, ACM Trans. Intell. Syst. Technol..

[23]  Ali Etemad,et al.  Two-Level Attention-based Fusion Learning for RGB-D Face Recognition , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[24]  Paulo Lobato Correia,et al.  The IST-EURECOM Light Field Face Database , 2017, 2017 5th International Workshop on Biometrics and Forensics (IWBF).

[25]  Shiguang Shan,et al.  Improving 2D Face Recognition via Discriminative Face Depth Estimation , 2018, 2018 International Conference on Biometrics (ICB).

[26]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[27]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[28]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[29]  Jean-Luc Dugelay,et al.  Recent Advances in Biometric Technology for Mobile Devices , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[30]  Shuicheng Yan,et al.  Robust LSTM-Autoencoders for Face De-Occlusion in the Wild , 2016, IEEE Transactions on Image Processing.

[31]  G. Yovel,et al.  Critical features for face recognition , 2019, Cognition.

[32]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Yunhong Wang,et al.  Lock3DFace: A large-scale database of low-cost Kinect 3D faces , 2016, 2016 International Conference on Biometrics (ICB).

[35]  Richa Singh,et al.  RGB-D face recognition via learning-based reconstruction , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[36]  Guangyi Zhang,et al.  Capsule Attention for Multimodal EEG and EOG Spatiotemporal Representation Learning with Application to Driver Vigilance Estimation , 2019, ArXiv.

[37]  Alan C. Bovik,et al.  Texas 3D Face Recognition Database , 2010, 2010 IEEE Southwest Symposium on Image Analysis & Interpretation (SSIAI).

[38]  Jiwen Lu,et al.  Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Wei Liu,et al.  Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Byoung-Tak Zhang,et al.  Bilinear Attention Networks , 2018, NeurIPS.

[42]  Alice Towler,et al.  Familiarity and Within-Person Facial Variability: The Importance of the Internal and External Features , 2018, Perception.

[43]  Richa Singh,et al.  Unconstrained Kinect video face database , 2017, Inf. Fusion.

[44]  Di Huang,et al.  Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jean-Luc Dugelay,et al.  Kinect vs Lytro in RGB-D Face Recognition , 2018, 2018 International Conference on Cyberworlds (CW).

[46]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[47]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[48]  Rita Cucchiara,et al.  Face Verification from Depth using Privileged Information , 2018, BMVC.

[49]  L. D. Harmon,et al.  Identification of human faces , 1971 .

[50]  Jianfeng Dong,et al.  Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[51]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Richa Singh,et al.  RGB-D Face Recognition With Texture and Attribute Features , 2014, IEEE Transactions on Information Forensics and Security.

[54]  Jean-Luc Dugelay,et al.  KinectFaceDB: A Kinect Database for Face Recognition , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[55]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[56]  Dong Xu,et al.  Distance Metric Learning Using Privileged Information for Face Verification and Person Re-Identification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Jung-Woo Ha,et al.  Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Fang Zhao,et al.  Towards Pose Invariant Face Recognition in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  J. Leppänen,et al.  Cross-cultural analysis of attention disengagement times supports the dissociation of faces and patterns in the infant brain , 2019, Scientific Reports.

[61]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[63]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[64]  Mei Wang,et al.  Deep Face Recognition: A Survey , 2018, Neurocomputing.

[65]  Anders Grunnet-Jepsen,et al.  Intel(R) RealSense(TM) Stereoscopic Depth Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Mohammed Bennamoun,et al.  An RGB-D based image set classification for robust face recognition from Kinect data , 2016, Neurocomputing.

[67]  William J. Christmas,et al.  When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[68]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[69]  Ajmal S. Mian,et al.  Using Kinect for face recognition under varying poses, expressions, illumination and disguise , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[70]  Paulo Lobato Correia,et al.  LIGHT FIELD BASED FACE RECOGNITION VIA A FUSED DEEP REPRESENTATION , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[71]  H. Ellis,et al.  Identification of Familiar and Unfamiliar Faces from Internal and External Features: Some Implications for Theories of Face Recognition , 1979, Perception.

[72]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[73]  Thomas B. Moeslund,et al.  RGB-D-T Based Face Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[74]  Shiguang Shan,et al.  Multi-Modal Face Presentation Attack Detection via Spatial and Channel Attentions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[75]  Ajmal S. Mian,et al.  Face recognition based on Kinect , 2015, Pattern Analysis and Applications.

[76]  Peter Peer,et al.  Ear recognition: More than a survey , 2016, Neurocomputing.

[77]  M. Farah,et al.  What is "special" about face perception? , 1998, Psychological review.

[78]  Jessica Royer,et al.  Greater reliance on the eye region predicts better face recognition ability , 2018, Cognition.