Near-duplicate keyframe retrieval by nonrigid image matching

Near-duplicate image retrieval plays an important role in many real-world multimedia applications. Most previous approaches have some limitations. For example, conventional appearance-based methods may suffer from the illumination variations and occlusion issue, and local feature correspondence-based methods often do not consider local deformations and the spatial coherence between two point sets. In this paper, we propose a novel and effective Nonrigid Image Matching (NIM) approach to tackle the task of near-duplicate keyframe retrieval from real-world video corpora. In contrast to previous approaches, the NIM technique can recover an explicit mapping between two near-duplicate images with a few deformation parameters and find out the correct correspondences from noisy data effectively. To make our technique applicable to large-scale applications, we suggest an effective multi-level ranking scheme that filters out the irrelevant results in a coarse-to-fine manner. In our ranking scheme, to overcome the extremely small training size challenge, we employ a semi-supervised learning method for improving the performance using unlabeled data. To evaluate the effectiveness of our solution, we have conducted extensive experiments on two benchmark testbeds extracted from the TRECVID2003 and TRECVID2004 corpora. The promising results show that our proposed method is more effective than other state-of-the-art approaches for near-duplicate keyframe retrieval.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[6]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Paul D. Over,et al.  TREC Video Retrieval Evaluation Website | NIST , 2000 .

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[11]  Michael R. Lyu,et al.  A Novel Scheme for Video Similarity Detection , 2003, CIVR.

[12]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[13]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[17]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[18]  Edward Y. Chang,et al.  Enhanced perceptual distance functions and indexing for image replica recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Pascal Fua,et al.  Object-centered surface reconstruction: Combining multi-image stereo and shading , 1995, International Journal of Computer Vision.

[20]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[21]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[23]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Vincent Lepetit,et al.  Fast Non-Rigid Surface Detection, Registration and Realistic Augmentation , 2008, International Journal of Computer Vision.

[25]  Chong-Wah Ngo,et al.  Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? , 2006, CIVR.

[26]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Chong-Wah Ngo,et al.  Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation , 2006, MM '06.

[29]  Michael R. Lyu,et al.  Progressive Finite Newton Approach To Real-time Nonrigid Surface Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[31]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[32]  Zenglin Xu,et al.  Efficient Convex Relaxation for Transductive Support Vector Machine , 2007, NIPS.

[33]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Framework for Content-Based Video Retrieval , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[34]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[35]  Chong-Wah Ngo,et al.  Near-duplicate keyframe retrieval with visual keywords and semantic context , 2007, CIVR '07.

[36]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval , 2008, IEEE Transactions on Multimedia.

[37]  Michael R. Lyu,et al.  Face Annotation Using Transductive Kernel Fisher Discriminant , 2008, IEEE Transactions on Multimedia.

[38]  Michael R. Lyu,et al.  A Fast 2D Shape Recovery Approach by Fusing Features and Appearance , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.