Video fingerprinting: features for duplicate and similar video detection and query-based video retrieval

A video "fingerprint" is a feature extracted from the video that should represent the video compactly, allowing faster search without compromising the retrieval accuracy. Here, we use a keyframe set to represent a video, motivated by the video summarization approach. We experiment with different features to represent each keyframe with the goal of identifying duplicate and similar videos. Various image processing operations like blurring, gamma correction, JPEG compression, and Gaussian noise addition are applied on the individual video frames to generate duplicate videos. Random and bursty frame drop errors of 20%, 40% and 60% (over the entire video) are also applied to create more noisy "duplicate" videos. The similar videos consist of videos with similar content but with varying camera angles, cuts, and idiosyncrasies that occur during successive retakes of a video. Among the feature sets used for comparison, for duplicate video detection, Compact Fourier-Mellin Transform (CFMT) performs the best while for similar video retrieval, Scale Invariant Feature Transform (SIFT) features are found to be better than comparable-dimension features. We also address the problem of retrieval of full-length videos with shorter-length clip queries. For identical feature size, CFMT performs the best for video retrieval.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Olivier Buisson,et al.  Statistical similarity search applied to content-based video copy detection , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[3]  Jesse S. Jin,et al.  Matching Commercial Clips from TV Streams Using a Unique, Robust and Compact Signature , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[4]  Olivier Buisson,et al.  Robust Content-Based Video Copy Identification in a Large Reference Database , 2003, CIVR.

[5]  Qi Tian,et al.  Fast and Robust Short Video Clip Search for Copy Detection , 2004, PCM.

[6]  Demetri Psaltis,et al.  Scale Invariant Optical Transform , 1976 .

[7]  Uday B. Desai,et al.  2-D object recognition using Fourier Mellin transform and a MLP network , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[10]  Rakesh Mohan,et al.  Video sequence matching , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Shree K. Nayar,et al.  Ordinal Measures for Image Correspondence , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Qi Tian,et al.  Fast and robust short video clip search using an index structure , 2004, MIR '04.

[15]  Qi Tian,et al.  A color fingerprint of video shot for content identification , 2004, MULTIMEDIA '04.

[16]  Georg Hartmann,et al.  Invariant object recognition with discriminant features based on local fast-Fourier Mellin transform , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[17]  Jiying Zhao,et al.  LPM-based RST invariant digital image watermarking , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[18]  Shree K. Nayar,et al.  Ordinal measures for visual correspondence , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Faouzi Ghorbel,et al.  A complete invariant description for gray-level images by the harmonic analysis approach , 1994, Pattern Recognit. Lett..

[20]  Faouzi Ghorbel,et al.  Robust and Efficient Fourier-Mellin Transform Approximations for Gray-Level Image Reconstruction and Complete Invariant Description , 2001, Comput. Vis. Image Underst..

[21]  Chang Dong Yoo,et al.  Video Fingerprinting Based on Centroids of Gradient Orientations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Ingemar J. Cox,et al.  Rotation, scale, and translation resilient watermarking for images , 2001, IEEE Trans. Image Process..