A Quantitative Study of Video Duplicate Levels in YouTube

The popularity of video sharing services has increased exponentially in recent years, but this popularity is accompanied by challenges associated with the tremendous scale of user bases and massive amounts of video data. A known inefficiency of video sharing services with user-uploaded content is widespread video duplication. These duplicate videos are often of different aspect ratios, can contain overlays or additional borders, or can be excerpted from a longer, original video, and thus can be difficult to detect. The proliferation of duplicate videos can have an impact at many levels, and accurate assessment of duplicate levels is a critical step toward mitigating their effects on both video sharing services and network infrastructure.

[1]  Christoph Zauner,et al.  Implementation and Benchmarking of Perceptual Image Hash Functions , 2010 .

[2]  Aditya Akella,et al.  An information-aware QoE-centric mobile video cache , 2013, MobiCom.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  Mandar Kulkarni,et al.  Analyzing Compute vs. Storage Tradeoff for Video-aware Storage Efficiency , 2012, HotStorage.

[6]  Cordelia Schmid,et al.  INRIA LEAR-TEXMEX: Video Copy Detection Task , 2010, TRECVID.

[7]  Cezary Dubnicki,et al.  HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System , 2010, FAST.

[8]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[9]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jon B. Weissman,et al.  ViDeDup: An Application-Aware Framework for Video De-duplication , 2011, HotStorage.

[11]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[12]  Zi Huang,et al.  UQLIPS: A Real-time Near-duplicate Video Clip Detection System , 2007, VLDB.

[13]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[14]  Cordelia Schmid,et al.  INRIA-LEAR'S Video Copy Detection System , 2008, TRECVID.

[15]  Michal Kaczmarczyk,et al.  HYDRAstor: A Scalable Secondary Storage , 2009, FAST.

[16]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[17]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[18]  Zhi-Li Zhang,et al.  Counting YouTube videos via random prefix sampling , 2011, IMC '11.

[19]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[20]  Ethan L. Miller,et al.  The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.