Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching

Near-duplicate keyframe (NDK) retrieval techniques are critical to many real-world multimedia applications. Over the last few years, we have witnessed a surge of attention on studying near-duplicate image/keyframe retrieval in the multimedia community. To facilitate an effective approach to NDK retrieval on large-scale data, we suggest an effective Multi-Level Ranking (MLR) scheme that effectively retrieves NDKs in a coarse-to-fine manner. One key stage of the MLR ranking scheme is how to learn an effective ranking function with extremely small training examples in a near-duplicate detection task. To attack this challenge, we employ a semi-supervised learning method, semi-supervised support vector machines, which is able to significantly improve the retrieval performance by exploiting unlabeled data. Another key stage of the MLR scheme is to perform a fine matching among a subset of keyframe candidates retrieved from the previous coarse ranking stage. In contrast to previous approaches based on either simple heuristics or rigid matching models, we propose a novel Nonrigid Image Matching (NIM) approach to tackle near-duplicate keyframe retrieval from real-world video corpora in order to conduct an effective fine matching. Compared with the conventional methods, the proposed NIM approach can recover explicit mapping between two near-duplicate images with a few deformation parameters and find out the correct correspondences from noisy data simultaneously. To evaluate the effectiveness of our proposed approach, we performed extensive experiments on two benchmark testbeds extracted from the TRECVID2003 and TRECVID2004 corpora. The promising results indicate that our proposed method is more effective than other state-of-the-art approaches for near-duplicate keyframe retrieval.

[1]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[2]  Michael R. Lyu,et al.  Face Annotation Using Transductive Kernel Fisher Discriminant , 2008, IEEE Transactions on Multimedia.

[3]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[4]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval , 2008, IEEE Transactions on Multimedia.

[5]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[6]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[7]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[8]  Vincent Lepetit,et al.  Fast Non-Rigid Surface Detection, Registration and Realistic Augmentation , 2008, International Journal of Computer Vision.

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Michael Isard,et al.  General Theory , 1969 .

[13]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[14]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[15]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[16]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[17]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  Michael R. Lyu,et al.  Progressive Finite Newton Approach To Real-time Nonrigid Surface Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Edward Y. Chang,et al.  Enhanced perceptual distance functions and indexing for image replica recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael R. Lyu,et al.  A Fast 2D Shape Recovery Approach by Fusing Features and Appearance , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Zenglin Xu,et al.  An Effective Approach to 3D Deformable Surface Tracking , 2008, ECCV.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[29]  Pascal Fua,et al.  Object-centered surface reconstruction: Combining multi-image stereo and shading , 1995, International Journal of Computer Vision.

[30]  Michael R. Lyu,et al.  A Novel Scheme for Video Similarity Detection , 2003, CIVR.

[31]  Dong Xu,et al.  Near duplicate image identification with patially Aligned Pyramid Matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[33]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[34]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[35]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[36]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[37]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[38]  Chong-Wah Ngo,et al.  Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation , 2006, MM '06.

[39]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[40]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[41]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[42]  Zenglin Xu,et al.  Efficient Convex Relaxation for Transductive Support Vector Machine , 2007, NIPS.

[43]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[44]  Chong-Wah Ngo,et al.  Near-duplicate keyframe retrieval with visual keywords and semantic context , 2007, CIVR '07.

[45]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[47]  Chong-Wah Ngo,et al.  Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? , 2006, CIVR.

[48]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by nonrigid image matching , 2008, ACM Multimedia.

[49]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.