FCVID : Fudan-Columbia Video Dataset

Recognizing visual contents in unconstrained videos has become a very important problem for many applications, such as Web video search and recommendation, smart content-aware advertising, robotics, etc. Existing datasets for video content recognition are either small or do not have reliable manual labels. In this work, we construct and release a new Internet video dataset called Fudan-Columbia Video Dataset (FCVID), containing 91,223 Web videos (total duration 4,232 hours) annotated manually according to 239 categories. We believe that the release of FCVID can stimulate innovative research on this challenging and important problem.

[1]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[5]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jiebo Luo,et al.  Kodak consumer video benchmark data set : concept definition and annotation * * , 2008 .

[9]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[10]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[11]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[12]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[14]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[15]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.