LabelMe video: Building a video database with human annotations

Currently, video analysis algorithms suffer from lack of information regarding the objects present, their interactions, as well as from missing comprehensive annotated video databases for benchmarking. We designed an online and openly accessible video annotation system that allows anyone with a browser and internet access to efficiently annotate object category, shape, motion, and activity information in real-world videos. The annotations are also complemented with knowledge from static image databases to infer occlusion and depth information. Using this system, we have built a scalable video database composed of diverse video samples and paired with human-guided annotations. We complement this paper demonstrating potential uses of this database by studying motion statistics as well as cause-effect motion relationships between objects.

[1]  Marc Alexa,et al.  As-rigid-as-possible shape interpolation , 2000, SIGGRAPH.

[2]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[3]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[7]  Pietro Perona,et al.  Measuring and Predicting Importance of Objects in Our Visual World , 2007 .

[8]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[10]  Antonio Torralba,et al.  Building a database of 3D scenes from user annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[12]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[14]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[18]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[21]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.