Robust Face Recognition via Multimodal Deep Face Representation

Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All of the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefitting from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.

[1]  Matti Pietikäinen,et al.  Learning Discriminant Face Descriptor , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Anil K. Jain,et al.  Face Search at Scale: 80 Million Gallery , 2015, ArXiv.

[4]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Eric Eaton,et al.  Multi-view constrained clustering with an incomplete mapping between views , 2012, Knowledge and Information Systems.

[6]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[7]  Sébastien Marcel,et al.  The 2013 speaker recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[8]  Stan Z. Li,et al.  Towards Pose Robust Face Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xian-Sheng Hua,et al.  Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Xian-Sheng Hua,et al.  Learning semantic distance from community-tagged media collection , 2009, MM '09.

[13]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[14]  Esa Rahtu,et al.  BSIF: Binarized statistical image features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Dacheng Tao,et al.  Multi-Task Pose-Invariant Face Recognition , 2015, IEEE Transactions on Image Processing.

[16]  Qi Yin,et al.  Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? , 2015, ArXiv.

[17]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Timo Ahonen,et al.  Recognition of blurred faces using Local Phase Quantization , 2008, 2008 19th International Conference on Pattern Recognition.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Aleix M. Martínez,et al.  Kernel Optimization in Discriminant Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[24]  Moncef Gabbouj,et al.  The 2013 face recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[25]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Josef Kittler,et al.  Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features , 2014, IEEE Transactions on Multimedia.

[27]  Jonghyun Choi,et al.  Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[30]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[32]  Bruce A. Draper,et al.  Report on the FG 2015 Video Person Recognition Evaluation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[36]  Dacheng Tao,et al.  A Comprehensive Survey on Pose-Invariant Face Recognition , 2015, ACM Trans. Intell. Syst. Technol..

[37]  Shang-Hong Lai,et al.  Expression-Invariant Face Recognition With Constrained Optical Flow Warping , 2009, IEEE Transactions on Multimedia.

[38]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[39]  Zhibin Hong,et al.  Robust Multitask Multiview Tracking in Videos , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Wesley De Neve,et al.  Collaborative Face Recognition for Improved Face Annotation in Personal Photo Collections Shared on Online Social Networks , 2011, IEEE Transactions on Multimedia.

[41]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[43]  Syed Fawad Hussain,et al.  Co-clustering of multi-view datasets , 2015, Knowledge and Information Systems.