Latent max-margin metric learning for comparing video face tubes

Comparing “face tubes” is a key component of modern systems for face biometrics based video analysis and annotation. We present a novel algorithm to learn a distance metric between such spatio-temporal face tubes in videos. The main novelty in the algorithm is based on incorporation of latent variables in a max-margin metric learning framework. The latent formulation allows us to model, and learn metrics to compare faces under different challenging variations in pose, expressions and lighting. We propose a novel dataset named TV Series Face Tubes (TSFT) for evaluating the task. The dataset is collected from 12 different episodes of 8 popular TV series and has 94 subjects with 569 manually annotated face tracks in total. We show quantitatively how incorporating latent variables in max-margin metric learning leads to improvement of current state-of-the-art metric learning methods for the two cases when the testing is done with subjects that were seen during training and when the test subjects were not seen at all during training. We also give results on a challenging benchmark dataset: YouTube faces, and place our algorithm in context w.r.t. existing methods.

[1]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[2]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[5]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Anil K. Jain,et al.  Video-to-video face matching: Establishing a baseline for unconstrained face recognition , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[7]  Tsuhan Chen,et al.  The CMU Face In Action (FIA) Database , 2005, AMFG.

[8]  PerronninFlorent,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014 .

[9]  Tal Hassner,et al.  Similarity Scores Based on Background Samples , 2009, ACCV.

[10]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[12]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[13]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[14]  Gaurav Sharma,et al.  Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis , 2012, ECCV.

[15]  Frédéric Jurie,et al.  Face Recognition using Local Quantized Patterns , 2012, BMVC.

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[18]  David J. Kriegman,et al.  Online learning of probabilistic appearance manifolds for video-based recognition and tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jiwen Lu,et al.  Robust Feature Set Matching for Partial Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[22]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[24]  Massimiliano Pontil,et al.  Large Margin Local Metric Learning , 2014, ECCV.

[25]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Peng Li,et al.  Similarity Metric Learning for Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Frédéric Jurie,et al.  PCCA: A new approach for distance learning from sparse pairwise constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Gang Hua,et al.  Probabilistic Elastic Matching for Pose Variant Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xiaogang Wang,et al.  Hybrid Deep Learning for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Oren Barkan,et al.  Fast High Dimensional Vector Multiplication Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[34]  Andrew Zisserman,et al.  A Compact and Discriminative Face Track Descriptor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[37]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[38]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[39]  Peter N. Belhumeur,et al.  Tom-vs-Pete Classifiers and Identity-Preserving Alignment for Face Verification , 2012, BMVC.

[40]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[41]  Shiguang Shan,et al.  Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[43]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Lior Wolf,et al.  The SVM-Minus Similarity Score for Video Face Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.