Cast indexing for videos by NCuts and page ranking

Cast indexing is an important video mining technique which provides audience the capability to efficiently retrieve interested scenes, events, and stories from a long video. This paper proposes a novel cast indexing approach based on Normalized Graph Cuts (NCuts) and Page Ranking. The system first adopts face tracker to group face images in each shot into face sets, and then extract local SIFT feature as the feature representation. There are two key problems for cast indexing. One is to find an optimal partition to cluster face sets into main cast. The other is how to exploit the latent relationships among characters to provide a more accurate cast ranking. For the first problem, we model each face set as a graph node, and adopt Normalized Graph Cuts (NCuts) to realize an optimal graph partition. A novel local neighborhood distance is proposed to measure the distance between face sets for NCuts, which is robust to outliers. For the second problem, we build a relation graph for characters by their co-occurrence information, and then adopt the PageRank algorithm to estimate the Important Factor (IF) of each character. The PageRank IF is fused with the content based retrieval score for final ranking. Extensive experiments are carried out on movies, TV series and home videos. Promising results demonstrate the effectiveness of proposed methods.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[3]  Jun Wu,et al.  Tsinghua University at TRECVID 2004: Shot Boundary Detection and High-Level Feature Extraction , 2004, TRECVID.

[4]  Tao Wang,et al.  Semi-supervised Cast Indexing for Feature-Length Films , 2007, MMM.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Andrew Zisserman,et al.  Identifying individuals in video by combining 'generative' and discriminative head models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  David J. Kriegman,et al.  Online learning of probabilistic appearance manifolds for video-based recognition and tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Li Zhang,et al.  Robust Face Alignment Based on Hierarchical Classifier Network , 2006, ECCV Workshop on HCI.

[10]  Andrew Zisserman,et al.  Automatic face recognition for film character retrieval in feature-length films , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Yajie Tian,et al.  Handbook of face recognition , 2003 .

[12]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[15]  Yuan Li,et al.  Robust Head Tracking with Particles Based on Multiple Cues Fusion , 2006, ECCV Workshop on HCI.