Semi-automatic ground truth annotation in videos: An interactive tool for polygon-based object annotation and segmentation

Knowledge extraction from video data is challenging due to its high complexity in both the spatial and temporal domain. Ground truth is crucial for the evaluation and the adaptation of algorithms to new domains. Unfortunately, ground truth annotation is inconvenient and time consuming. Common annotation tools mostly rely on simple geometric primitives such as rectangles or ellipses. Here we propose a novel, interactive and semi-automatic process, which actively asks for user input if the result of the automatic annotation appears to be incorrect. After a brief review of related tools for video annotation, we explain our proposed semi-automatic method iSeg using a prototype implementation. iSeg has been tested on two visual stimulus datasets for eye tracking experiments and on two surveillance datasets. The experimental results and the usability are compared to existing annotation tools. Finally, we discuss the properties and opportunities of polygon-based video annotation.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[3]  Yiannis Kompatsiaris,et al.  A Survey of Semantic Image and Video Annotation Tools , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[4]  Craig Gotsman,et al.  Guaranteed intersection-free polygon morphing , 2001, Comput. Graph..

[5]  David S. Doermann,et al.  Tools and techniques for video performance evaluation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Jane Hunter,et al.  Vannotea: A collaborative video indexing, annotation and discussion system for broadband networks , 2003 .

[7]  Pattreeya Tanisaro,et al.  Visual analytics for video applications , 2015, it Inf. Technol..

[8]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[9]  Daniel Weiskopf,et al.  Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli , 2014, BELIV.

[10]  H. Alt Discrete Geometric Shapes Matching Interpolation and Approximation A Survey , 2009 .

[11]  Shuang Wu,et al.  SAGTA: Semi-automatic Ground Truth Annotation in crowd scenes , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[12]  J. Peetre,et al.  Interpolation of Compact Operators: The Multidimensional Case , 1991 .