A Compact and Discriminative Face Track Descriptor

Our goal is to learn a compact, discriminative vector representation of a face track, suitable for the face recognition tasks of verification and classification. To this end, we propose a novel face track descriptor, based on the Fisher Vector representation, and demonstrate that it has a number of favourable properties. First, the descriptor is suitable for tracks of both frontal and profile faces, and is insensitive to their pose. Second, the descriptor is compact due to discriminative dimensionality reduction, and it can be further compressed using binarization. Third, the descriptor can be computed quickly (using hard quantization) and its compact size and fast computation render it very suitable for large scale visual repositories. Finally, the descriptor demonstrates good generalization when trained on one dataset and tested on another, reflecting its tolerance to the dataset bias. In the experiments we show that the descriptor exceeds the state of the art on both face verification task (YouTube Faces without outside training data, and INRIA-Buffy benchmarks), and face classification task (using the Oxford-Buffy dataset).

[1]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[2]  Rama Chellappa,et al.  Face recognition from video: a CONDENSATION approach , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[3]  Andrew Zisserman,et al.  "Who are you?" - Learning person specific classifiers from video , 2009, CVPR.

[4]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[6]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[7]  Rainer Stiefelhagen,et al.  “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[9]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[10]  Johannes Stallkamp,et al.  Video-based Face Recognition on Real-World Data , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[13]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[14]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[15]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[17]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[18]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[19]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[20]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[21]  Shiguang Shan,et al.  Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[23]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[24]  Heydi Mendez Vazquez,et al.  Volume structured ordinal features with background similarity measure for video face recognition , 2013, 2013 International Conference on Biometrics (ICB).

[25]  Matti Pietikäinen,et al.  Manifold Learning for Video-to-Video Face Recognition , 2009, COST 2101/2102 Conference.

[26]  Andrew Zisserman,et al.  On-the-fly specific person retrieval , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Hervé Jégou,et al.  Anti-sparse coding for approximate nearest neighbor search , 2011, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Andrew Zisserman,et al.  Identifying individuals in video by combining 'generative' and discriminative head models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  WU KarenT,et al.  Results , 1969 .

[31]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[32]  Lior Wolf,et al.  The SVM-Minus Similarity Score for Video Face Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Gang Hua,et al.  Probabilistic Elastic Matching for Pose Variant Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ralph Gross,et al.  Appearance-Based 3-D Face Recognition from Video , 2002, DAGM-Symposium.

[35]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[36]  Anil K. Jain,et al.  3D model-assisted face recognition in video , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[37]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[38]  Patrick J. Flynn,et al.  Face Recognition from Video: a Review , 2012, Int. J. Pattern Recognit. Artif. Intell..

[39]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Andrew Zisserman,et al.  Who Are You? - Real-time Person Identification , 2007, BMVC.