Unsupervised learning from local features for video-based face recognition

This paper presents an unsupervised learning approach to video-based face recognition that does not make any assumptions about the pose, expressions or prior localization of landmarks on the faces. The proposed algorithm exploits spatiotemporal information obtained from local features that are extracted from arbitrary keypoints on faces as opposed to pre-defined landmarks. The algorithm is inherently robust to large scale occlusions as it relies on local features. During unsupervised learning, faces from a video sequence are automatically clustered based on the similarity of their local features and a voting-based algorithm is employed to pick the representative features of each cluster. During recognition, video frames of a probe are sequentially matched to the clusters of all individuals in the gallery and its identity is decided on the basis of best temporally cohesive cluster matches. The proposed algorithms can also detect sudden identity changes in video by utilizing the temporal dimension. The algorithm was tested on the Honda/UCSD video database and a maximum of 99.5% recognition rate was achieved.

[1]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[2]  Ioannis A. Kakadiaris,et al.  Intraclass Retrieval of Nonrigid 3D Objects: Application to Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xin Chen,et al.  Face Recognition Using 2-D, 3-D, and Infrared: Is Multimodal Better Than Multisample? , 2006, Proceedings of the IEEE.

[4]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[5]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[6]  Anil K. Jain,et al.  Matching 2.5D face scans to 3D models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Shaogang Gong,et al.  Constructing facial identity surfaces in a nonlinear discriminating space , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Patrick J. Flynn,et al.  A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition , 2006, Comput. Vis. Image Underst..

[9]  David J. Kriegman,et al.  Online learning of probabilistic appearance manifolds for video-based recognition and tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[11]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Mohammed Bennamoun,et al.  An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Tieniu Tan,et al.  Online Appearance Model Learning for Video-Based Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[16]  Patrick J. Flynn,et al.  A Region Ensemble for 3-D Face Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[17]  Patrick J. Flynn,et al.  Multiple Nose Region Matching for 3D Face Recognition under Varying Facial Expression , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.