Multiple face tracking and recognition with identity-specific localized metric learning

Abstract While face recognition has been intensively studied in the literature, there are only a few attempts on visual recognition of multiple faces simultaneously in videos, which has potential applications in practical video surveillance. In this paper, we address the problem of visual recognition and tracking of multiple faces in real-world videos involving large pose variation and occlusion. Instead of recognizing individual face independently, we introduce the constraints of inter-frame temporal smoothness and within-frame identity exclusivity on multiple faces in videos, and model the tasks of multiple face recognition (MFR) and multiple face tracking (MFT) jointly in an alternative optimization framework. We show this joint formulation for two different tasks leads to significantly improved MFR accuracy. Specifically, as appearance matching for face instances over consecutive frames plays a critical role in MFT, we propose an identity-specific metric learning method with a part-based object representation to learn a localized transformation for each face subject in an online manner, under which face instances of the same subject over consecutive frames are pulled as close as possible, while those of different subjects are pushed far away. Empirically, we evaluate our method on several MFR sequences against baselines, and the results demonstrate that our method can achieve improved accuracy performance in various challenging recognition scenarios.

[1]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xilin Chen,et al.  Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[4]  Vladimir Pavlovic,et al.  An efficient IP approach to constrained multiple face tracking and recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[6]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jian Yang,et al.  Approximate Orthogonal Sparse Embedding for Dimensionality Reduction , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Haohan Li,et al.  Who missed the class? — Unifying multi-face detection, tracking and recognition in videos , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[9]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Jiwen Lu,et al.  Learning Compact Binary Face Descriptor for Face Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  John See,et al.  Fusing cluster-centric feature similarities for face recognition in video sequences , 2013, Pattern Recognit. Lett..

[12]  Jian Yang,et al.  Rotational Invariant Dimensionality Reduction Algorithms , 2017, IEEE Transactions on Cybernetics.

[13]  Nan Jiang,et al.  Learning Adaptive Metric for Robust Visual Tracking , 2011, IEEE Transactions on Image Processing.

[14]  Jiwen Lu,et al.  Deep Metric Learning for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, CVPR.

[17]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jiwen Lu,et al.  Neighborhood repulsed metric learning for kinship verification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shiguang Shan,et al.  Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness , 2016, Neurocomputing.

[20]  Shiguang Shan,et al.  Face recognition on large-scale video in the wild with hybrid Euclidean-and-Riemannian metric learning , 2015, Pattern Recognit..

[21]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[22]  Zhibin Hong,et al.  Dual-Force Metric Learning for Robust Distracter-Resistant Tracker , 2012, ECCV.

[23]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[24]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ajmal S. Mian,et al.  Online learning from local features for video-based face recognition , 2011, Pattern Recognit..

[27]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[28]  Patrick J. Flynn,et al.  Face Recognition from Video: a Review , 2012, Int. J. Pattern Recognit. Artif. Intell..

[29]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[30]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[31]  Gang Hua,et al.  Discriminative Tracking by Metric Learning , 2010, ECCV.

[32]  Andrew Zisserman,et al.  "Who are you?" - Learning person specific classifiers from video , 2009, CVPR.

[33]  Jiwen Lu,et al.  Regularized Locality Preserving Projections and Its Extensions for Face Recognition , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Jiwen Lu,et al.  Transform-Invariant PCA: A Unified Approach to Fully Automatic FaceAlignment, Representation, and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[36]  Ning Ye,et al.  Towards general motion-based face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Bogdan Raducanu,et al.  Pose-Invariant Face Recognition in Videos for Human-Machine Interaction , 2012, ECCV Workshops.

[38]  Philip H. S. Torr,et al.  Struck: Structured output tracking with kernels , 2011, ICCV.

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[40]  Bingbing Ni,et al.  Learning universal multi-view age estimator using video context , 2011, 2011 International Conference on Computer Vision.

[41]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[42]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[43]  Jun Guo,et al.  Equidistant prototypes embedding for single sample based face recognition with generic learning and incremental learning , 2014, Pattern Recognit..

[44]  Zhihui Lai,et al.  The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition , 2016, Pattern Recognit..

[45]  Huchuan Lu,et al.  Visual Tracking via Weighted Local Cosine Similarity , 2015, IEEE Transactions on Cybernetics.

[46]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.