Joint space learning for video-based face recognition

Popularity of surveillance and mobile cameras provides great opportunities to video-based face recognition (VFR) in less-controlled conditions. This paper proposes a joint space learning method to simultaneously identify the most representative samples and discriminative features from facial videos for reliable face recognition. Specifically, we use a mixture modal by learning multiple feature spaces to capture the data variations where the representative samples in each subspace are learned. Actually, this procedure is a chick to egg problem and an alternate algorithm is developed to monotonically optimize the joint task. In addition, randomized techniques are applied to kernel approximations for capturing the nonlinear structure in data, so that both accuracy and efficiency of our method can be improved. The proposed method performs better than the state-of-the-art video based face recognition methods on Honda, Mobo and YouTube Celebrities databases.

[1]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[2]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[3]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[4]  Jiwen Lu,et al.  Simultaneous Feature and Dictionary Learning for Image Set Based Face Recognition , 2014, IEEE Transactions on Image Processing.

[5]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, CVPR.

[9]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[11]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[12]  Rama Chellappa,et al.  Video-based face recognition via joint sparse representation , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[13]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Shiguang Shan,et al.  Image sets alignment for Video-Based Face Recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dennis DeCoste,et al.  Compact Random Feature Maps , 2013, ICML.

[18]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.