Video-Based Face Association and Identification

In this paper, we present a new video-based face identification algorithm, where the target (i.e., person of interest) in the probe video is only annotated once with a face bounding box in a frame and the video may consist of multiple shots. Most video face identification techniques assume that the video is of single shot, and thus the bounding boxes of the target face can be extracted by tracking a face across the video frames. Nevertheless, such automatic annotation is vulnerable to the drifting of the face tracker, and the face tracking algorithm is inadequate to associate the face images of the target across multiple shots. In this paper, we propose a target face association (TFA) technique that retrieves a set of representative face images in a given video that are likely to have the same identity as the target face. These face images are then utilized to construct a robust face representation of the target face for searching the corresponding subject in the gallery. Since two faces that appear in the same video frame cannot belong to the same person, such cannot-link constraints are utilized for learning a target-specific linear classifier for establishing the intra/inter-shot face association of the target. Experimental results on the newly released JANUS challenge set 3 (JANUS CS3) dataset show that our method generates robust representations from target-annotated videos and demonstrates good performance for the task of video-based face identification problem.

[1]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  George W. Quinn,et al.  Report on the Evaluation of 2D Still-Image Face Recognition Algorithms , 2011 .

[3]  Andrew Zisserman,et al.  Automatic face recognition for film character retrieval in feature-length films , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rainer Stiefelhagen,et al.  Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[8]  Yihong Gong,et al.  Tracking Persons-of-Interest via Adaptive Discriminative Features , 2016, ECCV.

[9]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  A Compact and Discriminative Face Track Descriptor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Qiong Cao,et al.  Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[12]  Jun-Cheng Chen,et al.  An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[13]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[14]  Duy-Dinh Le,et al.  Scalable Face Track Retrieval in Video Archives Using Bag-of-Faces Sparse Representation , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alberto Del Bimbo,et al.  Automatic Detection and Recognition of Players in Soccer Videos , 2007, VISUAL.

[19]  Eric Sommerlade,et al.  Total Cluster: A person agnostic clustering method for broadcast videos , 2014, ICVGIP '14.

[20]  Rama Chellappa,et al.  Face Association across Unconstrained Video Frames Using Conditional Random Fields , 2012, ECCV.

[21]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[22]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[23]  Sander Stuijk,et al.  Online multi-face detection and tracking using detector confidence and structured SVMs , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Shuicheng Yan,et al.  Toward Large-Population Face Identification in Unconstrained Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[31]  Xiaoou Tang,et al.  Joint Face Representation Adaptation and Clustering in Videos , 2016, ECCV.

[32]  Gang Hua,et al.  The IJCB 2014 PaSC video face and person recognition competition , 2014, IEEE International Joint Conference on Biometrics.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Ramakant Nevatia,et al.  Robust multi-pose face tracking by multi-stage tracklet association , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[35]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[36]  Gang Hua,et al.  Eigen-PEP for Video Face Recognition , 2014, ACCV.

[37]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[38]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).