Identifying scenes with the same person in video content on the basis of scene continuity and face similarity measurement

Abstract We present a method that can automatically annotate when and who is appearing in a video stream that is shot in an unstaged condition. Previous face recognition methods were not robust against different shooting conditions, such as those with variable lighting, face directions, and other factors, in a video stream and had difficulties identifying a person and the scenes the person appears in. To overcome such difficulties, our method groups consecutive video frames (scenes) into clusters that each have the same person’s face, which we call a “facial-temporal continuum,” and identifies a person by using many video frames in each cluster. In our experiments, accuracy with our method was approximately two or three times higher than a previous method that recognizes a face in each frame.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  T. Vetter,et al.  A statistical method for robust 3D surface reconstruction from sparse data , 2004 .

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Takayoshi Yamashita,et al.  Improvements to facial contour detection by hierarchical fitting and regression , 2011, The First Asian Conference on Pattern Recognition.