Multimodal facial biometrics recognition: Dual-stream convolutional neural networks with multi-feature fusion layers

Abstract Facial recognition for surveillance applications still remains challenging in uncontrolled environments, especially with the appearances of masks/veils and different ethnicities effects. Multimodal facial biometrics recognition becomes one of the major studies to overcome such scenarios. However, to cooperate with multimodal facial biometrics, many existing deep learning networks rely on feature concatenation or weight combination to construct a representation layer to perform its desired recognition task. This concatenation is often inefficient, as it does not effectively cooperate with the multimodal data to improve on recognition performance. Therefore, this paper proposes using multi-feature fusion layers for multimodal facial biometrics, thereby leading to significant and informative data learning in dual-stream convolutional neural networks. Specifically, this network consists of two progressive parts with distinct fusion strategies to aggregate RGB data and texture descriptors for multimodal facial biometrics. We demonstrate that the proposed network offers a discriminative feature representation and benefits from the multi-feature fusion layers for an accuracy-performance gain. We also introduce and share a new dataset for multimodal facial biometric data, namely the Ethnic-facial dataset for benchmarking. In addition, four publicly accessible datasets, namely AR, FaceScrub, IMDB_WIKI, and YouTube Face datasets are used to evaluate the proposed network. Through our experimental analysis, the proposed network outperformed several competing networks on these datasets for both recognition and verification tasks.

[1]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[2]  Mohammed Bennamoun,et al.  Deep Reconstruction Models for Image Set Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Nitish Srivastava,et al.  Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .

[4]  Yong Man Ro,et al.  Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion , 2019, Multimedia Tools and Applications.

[5]  Yuwu Lu,et al.  Adaptive weighted fusion: A novel fusion approach for image classification , 2015, Neurocomputing.

[6]  Richa Singh,et al.  Group sparse representation based classification for multi-feature multimodal biometrics , 2016, Inf. Fusion.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Patrick J. Grother,et al.  Ongoing Face Recognition Vendor Test (FRVT) Part 2: Identification , 2018 .

[9]  Xiangyang Xue,et al.  Multi-task Deep Neural Network for Joint Face Recognition and Facial Attribute Prediction , 2017, ICMR.

[10]  Nikos Komodakis,et al.  A Deep Metric for Multimodal Registration , 2016, MICCAI.

[11]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yu Liu,et al.  Fusion that matters: convolutional fusion networks for visual recognition , 2018, Multimedia Tools and Applications.

[13]  Na Liu,et al.  Multimodal biometrics recognition based on local fusion visual features and variational Bayesian extreme learning machine , 2016, Expert Syst. Appl..

[14]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[15]  Tieniu Tan,et al.  Deep Feature Fusion for Iris and Periocular Biometrics on Mobile Devices , 2018, IEEE Transactions on Information Forensics and Security.

[16]  Vitomir Struc,et al.  The Complete Gabor-Fisher Classifier for Robust Face Recognition , 2010, EURASIP J. Adv. Signal Process..

[17]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[18]  Dariusz Frejlichowski,et al.  Intelligent video surveillance systems for public spaces – a survey , 2014 .

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  William J. Christmas,et al.  When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[21]  Arun Ross,et al.  50 years of biometric research: Accomplishments, challenges, and opportunities , 2016, Pattern Recognit. Lett..

[22]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[23]  A. Martínez,et al.  The AR face databasae , 1998 .

[24]  Mark Latonero,et al.  Digital identity in the migration & refugee context: Italy case study , 2019 .

[25]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[26]  Kang Ryoung Park,et al.  CNN-Based Multimodal Human Recognition in Surveillance Environments , 2018, Sensors.

[27]  Graham W. Taylor,et al.  Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.

[28]  Nasser M. Nasrabadi,et al.  Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[29]  J. Macgregor,et al.  Image texture analysis: methods and comparisons , 2004 .

[30]  David White,et al.  Error Rates in Users of Automatic Face Recognition Software , 2015, PloS one.

[31]  Anil K. Jain,et al.  A Case Study of Automated Face Recognition: The Boston Marathon Bombings Suspects , 2013, Computer.

[32]  Evan Kidd,et al.  A critical period for faces: Other-race face recognition is improved by childhood but not adult social contact , 2019, Scientific Reports.

[33]  Daniela Moctezuma,et al.  Automated border control e-gates and facial recognition systems , 2016, Comput. Secur..

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Shaogang Gong,et al.  Surveillance Face Recognition Challenge , 2018, ArXiv.

[36]  Ya Wang,et al.  Face recognition in real-world surveillance videos with deep learning method , 2017, 2017 2nd International Conference on Image, Vision and Computing (ICIVC).

[37]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[38]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[40]  K JainAnil,et al.  50 years of biometric research , 2016 .

[41]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[42]  Seung Chul Rhee,et al.  Biometric Study of Eyelid Shape and Dimensions of Different Races with References to Beauty , 2012, Aesthetic Plastic Surgery.

[43]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[44]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).