Human-assisted motion annotation

Obtaining ground-truth motion for arbitrary, real-world video sequences is a challenging but important task for both algorithm evaluation and model design. Existing ground-truth databases are either synthetic, such as the Yosemite sequence, or limited to indoor, experimental setups, such as the database developed by Baker et al (2007). We propose a human-in-loop methodology to create a ground-truth motion database for the videos taken with ordinary cameras in both indoor and outdoor scenes, using the fact that human beings are experts at segmenting objects and inspecting the match between two frames. We designed an interactive computer vision system to allow a user to efficiently annotate motion. Our methodology is cross-validated by showing that human annotated motion is repeatable, consistent across annotators, and close to the ground truth obtained by Baker et al (2007). Using our system, we collected and annotated 10 indoor and outdoor real-world videos to form a ground-truth motion database. The source code, annotation tool and database is online for public evaluation and benchmarking.

[1]  Thomas S. Huang,et al.  Image processing , 1971 .

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[5]  Richard Szeliski,et al.  Fast Surface Interpolation Using Hierarchical Basis Functions , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Edward H. Adelson,et al.  Layered representations for image coding , 1991 .

[7]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[8]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Edward H. Adelson,et al.  Perceptually Organized Em: a Framework for Motion Segmentation That Combines Information about Form and Motion , 1995 .

[10]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[11]  Richard Szeliski,et al.  An integrated Bayesian approach to layer extraction from image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Richard Szeliski,et al.  An Integrated Bayesian Approach to Layer Extraction from Image Sequences , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Rachid Deriche,et al.  Symmetrical Dense Optical Flow Estimation with Occlusions Detection , 2002, ECCV.

[15]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[16]  Qionghai Dai,et al.  An accurate scene-based traffic model for MPEG video stream , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[17]  Serge J. Belongie,et al.  What went where , 2003, CVPR 2003.

[18]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[19]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[20]  David Salesin,et al.  Keyframe-based tracking for rotoscoping and animation , 2004, ACM Trans. Graph..

[21]  Ivar Austvoll,et al.  A Study of the Yosemite Sequence Used as a Test Sequence for Estimation of Optical Flow , 2005, SCIA.

[22]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[23]  A. Torralba,et al.  Motion magnification , 2005, SIGGRAPH 2005.

[24]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[26]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[27]  Edward H. Adelson,et al.  Analysis of Contour Motions , 2006, NIPS.

[28]  Seth J. Teller,et al.  Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[30]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.