V3C1 Dataset: An Evaluation of Content Characteristics

In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  George Awad,et al.  V3C - a Research Video Collection , 2018, MMM.

[3]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Basura Fernando,et al.  Discriminatively Learned Hierarchical Rank Pooling Networks , 2017, International Journal of Computer Vision.

[6]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.

[7]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Hamid A. Jalab,et al.  Image retrieval system based on color layout descriptor and Gabor filters , 2011, 2011 IEEE Conference on Open Systems.

[9]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[10]  Jong-Seok Lee,et al.  Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset , 2016, MMCommons @ ACM Multimedia.

[11]  Rahul Sukthankar,et al.  The 2nd YouTube-8M Large-Scale Video Understanding Challenge , 2018, ECCV Workshops.

[12]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[13]  Rehan Ashraf,et al.  Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor , 2018, 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).

[14]  Peter Lambert,et al.  On the performance of scalable video coding for VBR TV channels transport in multiple resolutions and qualities , 2010, Multimedia Tools and Applications.

[15]  Paul Over,et al.  Creating a web-scale video collection for research , 2009, WSMC '09.

[16]  Liangrui Peng,et al.  Rejecting Character Recognition Errors Using CNN Based Confidence Estimation , 2016 .

[17]  Shahid Masud,et al.  Memory Efficient VLSI Architecture for QCIF to VGA Resolution Conversion , 2009, PSIVT.

[18]  Terry Winograd,et al.  A practical approach to real-time neutral feature subtraction for facial expression recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Tej Singh,et al.  Video benchmarks of human action datasets: a review , 2018, Artificial Intelligence Review.

[20]  Chee Sun Won,et al.  Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[23]  Akio Yamada,et al.  The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[24]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[25]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[27]  Limin Wang,et al.  Locally Supervised Deep Hybrid Model for Scene Recognition , 2016, IEEE Transactions on Image Processing.

[28]  Heiko Schuldt,et al.  Cineast: A Multi-feature Sketch-Based Video Retrieval Engine , 2014, 2014 IEEE International Symposium on Multimedia.

[29]  Heiko Schuldt,et al.  Web Video in Numbers - An Analysis of Web-Video Metadata , 2017, ArXiv.