An interactive tool for manual, semi-automatic and automatic video annotation

The aim of annotation tools is to relieve the user from the burden of the manual annotation as much as possible.We developed an interactive video annotation tool that supports manual, semi-automatic and automatic annotations.The tool integrates several computer vision algorithms in an interactive and incremental learning framework.A quantitative and qualitative evaluation of the proposed tool on a challenging case study domain is presented.The use of the semi-automatic and automatic modality drastically reduces the human effort. The annotation of image and video data of large datasets is a fundamental task in multimedia information retrieval and computer vision applications. The aim of annotation tools is to relieve the user from the burden of the manual annotation as much as possible. To achieve this ideal goal, many different functionalities are required in order to make the annotations process as automatic as possible. Motivated by the limitations of existing tools, we have developed the iVAT: an interactive Video Annotation Tool. It supports manual, semi-automatic and automatic annotations through the interaction of the user with various detection algorithms. To the best of our knowledge, it is the first tool that integrates several computer vision algorithms working in an interactive and incremental learning framework. This makes the tool flexible and suitable to be used in different application domains. A quantitative and qualitative evaluation of the proposed tool on a challenging case study domain is presented and discussed. Results demonstrate that the use of the semi-automatic, as well as the automatic, modality drastically reduces the human effort while preserving the quality of the annotations.

[1]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[3]  Luc Van Gool,et al.  Interactive object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[6]  Paolo Napoletano,et al.  A semi-automatic annotation tool for cooking video , 2013, Electronic Imaging.

[7]  Simone Palazzo,et al.  A semi-automatic tool for detection and tracking ground truth generation in videos , 2012, VIGTA '12.

[8]  Atsushi Shimada,et al.  Cooking gesture recognition using local feature and depth image , 2012, CEA '12.

[9]  Atsushi Hashimoto,et al.  Tracking Food Materials with Changing Their Appearance in Food Preparing , 2010, 2010 IEEE International Symposium on Multimedia.

[10]  François Fleuret,et al.  FlowBoost — Appearance learning from sparsely annotated video , 2011, CVPR 2011.

[11]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[12]  Raimondo Schettini,et al.  Erratum to: An innovative algorithm for key frame extraction in video summarization , 2006, Journal of Real-Time Image Processing.

[13]  Paolo Napoletano,et al.  Cooking Action Recognition with iVAT: An Interactive Video Annotation Tool , 2013, ICIAP.

[14]  Mircea Nicolescu,et al.  Ground Truth Verification Tool (GTVT) for Video Surveillance Systems , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[15]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[16]  Simone Palazzo,et al.  An innovative web-based collaborative platform for video annotation , 2014, Multimedia Tools and Applications.

[17]  Ning Xu,et al.  Object segmentation using graph cuts based active contours , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  David Mihalcik,et al.  The Design and Implementation of ViPER , 2005 .

[19]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Concetto Spampinato,et al.  Generation of Ground Truth for Object Detection While Playing an Online Game: Productive Gaming or Recreational Working? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  ZhangJing,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video , 2009 .

[24]  Dimitrios Makris,et al.  An object-based comparative methodology for motion detection based on the F-Measure , 2008, Comput. Vis. Image Underst..

[25]  Shengcai Liao,et al.  Learning Multi-scale Block Local Binary Patterns for Face Recognition , 2007, ICB.

[26]  Raimondo Schettini,et al.  Erratum to: An innovative algorithm for key frame extraction in video summarization , 2012, Journal of Real-Time Image Processing.

[27]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.