Domain-Specific Face Synthesis for Video Face Recognition From a Single Sample Per Person

In video surveillance, face recognition (FR) systems are employed to detect individuals of interest appearing over a distributed network of cameras. The performance of still-to-video FR systems can decline significantly because faces captured in unconstrained operational domain (OD) over multiple video cameras have a different underlying data distribution compared to faces captured under controlled conditions in the enrollment domain with a still camera. This is particularly true when individuals are enrolled to the system using a single reference still. To improve the robustness of these systems, it is possible to augment the reference set by generating synthetic faces based on the original still. However, without the knowledge of the OD, many synthetic images must be generated to account for all possible capture conditions. FR systems may, therefore, require complex implementations and yield lower accuracy when training on many less relevant images. This paper introduces an algorithm for domain-specific face synthesis (DSFS) that exploits the representative intra-class variation information available from the OD. Prior to operation (during camera calibration), a compact set of faces from unknown persons appearing in the OD is selected through affinity propagation clustering in the captured condition space (defined by pose and illumination estimation). The domain-specific variations of these face images are then projected onto the reference still of each individual by integrating an image-based face relighting technique inside the 3-D reconstruction framework. A compact set of synthetic faces is generated that resemble individuals of interest under the capture conditions relevant to the OD. In a particular implementation based on sparse representation classification, the synthetic faces generated with the DSFS are employed to form a cross-domain dictionary that accounts for structured sparsity, where the dictionary blocks combine the original and synthetic faces of each individual. Experimental results obtained with videos from the Chokepoint and COX-S2V data sets reveal that augmenting the reference gallery set of still-to-video FR systems using the proposed DSFS approach can provide a significantly higher level of accuracy compared with the state-of-the-art approaches, with only a moderate increase in its computational complexity.

[1]  Gang Wang,et al.  Discriminative multi-manifold analysis for face recognition from a single training sample per person , 2011, 2011 International Conference on Computer Vision.

[2]  Jun Guo,et al.  Extended SRC: Undersampled Face Recognition via Intraclass Variant Dictionary , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ramakant Nevatia,et al.  Face recognition using deep multi-pose representations , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Robert Sabourin,et al.  Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations , 2017, Machine Vision and Applications.

[5]  Shiguang Shan,et al.  A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database , 2015, IEEE Transactions on Image Processing.

[6]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Eric Granger,et al.  Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[8]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[9]  Lei Zhang,et al.  Face recognition from a single training image under arbitrary unknown lighting using spherical harmonics , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  René Vidal,et al.  Robust classification using structured sparse representation , 2011, CVPR 2011.

[13]  Wotao Yin,et al.  Group sparse optimization by alternating direction method , 2013, Optics & Photonics - Optical Engineering + Applications.

[14]  Gang Hua,et al.  Towards Open-Set Identity Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Brian C. Lovell,et al.  Intelligent CCTV for Mass Transport Security: Challenges and Opportunities for Video and Face Processing , 2007 .

[17]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[18]  Yu-Chiang Frank Wang,et al.  Undersampled Face Recognition via Robust Auxiliary Dictionary Learning , 2015, IEEE Transactions on Image Processing.

[19]  Alan L. Yuille,et al.  Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples , 2016, IEEE Transactions on Image Processing.

[20]  Jun Guo,et al.  Equidistant prototypes embedding for single sample based face recognition with generic learning and incremental learning , 2014, Pattern Recognit..

[21]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yongkang Wong,et al.  Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition , 2011, CVPR 2011 WORKSHOPS.

[24]  Robert Sabourin,et al.  Ensembles of exemplar-SVMs for video face recognition from a single sample per person , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[25]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Yang Liu,et al.  A New Face Recognition Algorithm based on Dictionary Learning for a Single Training Sample per Person , 2015, BMVC.

[27]  Samy Bengio,et al.  On transforming statistical models for non-frontal face verification , 2006, Pattern Recognit..

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  Xiaojun Wu,et al.  Dynamic dictionary optimization for sparse-representation-based face classification using local difference images , 2017, Inf. Sci..

[30]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[31]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[32]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Zihan Zhou,et al.  Towards a practical face recognition system: Robust registration and illumination by sparse representation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Wen Gao,et al.  Adaptive generic learning for face recognition from a single sample per person , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[37]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[38]  Lei Zhang,et al.  Local Generic Representation for Face Recognition with Single Sample per Person , 2014, ACCV.

[39]  Robert Sabourin,et al.  Adaptive appearance model tracking for still-to-video face recognition , 2016, Pattern Recognit..

[40]  Wen Gao,et al.  Efficient 3D reconstruction for face recognition , 2005, Pattern Recognit..

[41]  Robert Sabourin,et al.  Dynamic ensembles of exemplar-SVMs for still-to-video face recognition , 2017, Pattern Recognit..

[42]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lei Zhang,et al.  Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Seungyong Lee,et al.  Intrinsic Image Decomposition Using Structure-Texture Separation and Surface Normals , 2014, ECCV.

[46]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xiaogang Wang,et al.  FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Guillaume-Alexandre Bilodeau,et al.  Synthetic face generation under various operational conditions in video surveillance , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[50]  Qinping Zhao,et al.  Face Illumination Manipulation Using a Single Reference Image by Adaptive Layer Decomposition , 2013, IEEE Transactions on Image Processing.

[51]  Eric Granger,et al.  An Extended Sparse Classification Framework for Domain Adaptation in Video Surveillance , 2016, ACCV Workshops.

[52]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[53]  Rama Chellappa,et al.  Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Kun Shang,et al.  A Customized Sparse Representation Model With Mixed Norm for Undersampled Face Recognition , 2016, IEEE Transactions on Information Forensics and Security.