Video-to-video face matching: Establishing a baseline for unconstrained face recognition

Face recognition in video is becoming increasingly important due to the abundance of video data captured by surveillance cameras, mobile devices, Internet uploads, and other sources. Given the aggregate of facial information contained in a video (i.e., a sequence of face images or frames), video-based face recognition solutions can potentially alleviate classic challenges caused by variations in pose, illumination, and expression. However, with this increased focus on the development of algorithms specifically crafted for video-based face recognition, it is important to establish a baseline for the accuracy using state-of-the-art still image matchers. Note that most commercial-off-the-shelf (COTS) offerings are still limited to single frame matching. In order to measure the accuracy of COTS face recognition systems on video data, we first investigate the effectiveness of multi-frame score-level fusion and analyze the consistency across three COTS face matchers. We demonstrate that all three COTS matchers individually are superior to previously published face recognition results on the unconstrained YouTube Faces database. Further, fusion of scores from the three COTS matchers achieves a 20% improvement in accuracy over previously published results. We encourage the use of these results as a competitive baseline for video-to-video face matching on the YouTube Faces database.

[1]  Patrick Flynn,et al.  Multi-frame Approaches To Improve Face Recognition , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[2]  Tsuhan Chen,et al.  The CMU Face In Action (FIA) Database , 2005, AMFG.

[3]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[4]  Jiri Matas,et al.  Combining evidence in personal identity verification systems , 1997, Pattern Recognit. Lett..

[5]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[7]  Gang Hua,et al.  Probabilistic Elastic Matching for Pose Variant Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David J. Kriegman,et al.  Online learning of probabilistic appearance manifolds for video-based recognition and tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[10]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Tsuhan Chen,et al.  Video-based face recognition using adaptive hidden Markov models , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Anil K. Jain,et al.  3D model-assisted face recognition in video , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[13]  Himanshu S. Bhatt,et al.  On rank aggregation for face recognition from videos , 2013, 2013 IEEE International Conference on Image Processing.

[14]  George W. Quinn,et al.  Report on the Evaluation of 2D Still-Image Face Recognition Algorithms , 2011 .

[15]  Xiaoming Liu,et al.  Multi-Frame Image Restoration for Face Recognition , 2007 .

[16]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[17]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[19]  Arun Ross,et al.  Face Recognition in Video: Adaptive Fusion of Multiple Matchers , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Roberto Cipolla,et al.  A manifold approach to face recognition from low quality video across illumination and pose using implicit super-resolution , 2007, ICCV 2007.

[21]  Xiaoming Liu,et al.  Multi-Frame Super-Resolution for Face Recognition , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[22]  Shiguang Shan,et al.  Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[24]  Patrick J. Flynn,et al.  Face Recognition from Video: a Review , 2012, Int. J. Pattern Recognit. Artif. Intell..

[25]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[26]  Shaun J. Canavan,et al.  Face Recognition by Multi-Frame Fusion of Rotating Heads in Videos , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.