Ground truth annotation of traffic video data

This paper presents a software application to generate ground-truth data on video files from traffic surveillance cameras used for Intelligent Transportation Systems (IT systems). The computer vision system to be evaluated counts the number of vehicles that cross a line per time unit –intensity-, the average speed and the occupancy. The main goal of the visual interface presented in this paper is to be easy to use without the requirement of any specific hardware. It is based on a standard laptop or desktop computer and a Jog shuttle wheel. The setup is efficient and comfortable because one hand of the annotating person is almost all the time on the space key of the keyboard while the other hand is on the jog shuttle wheel. The mean time required to annotate a video file ranges from 1 to 5 times its duration (per lane) depending on the content. Compared to general purpose annotation tool a time factor gain of about 7 times is achieved.

[1]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[2]  Sergio A. Velastin,et al.  A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[3]  Ferran Marqués,et al.  GAT: a Graphical Annotation Tool for semantic regions , 2009, Multimedia Tools and Applications.

[4]  Concetto Spampinato,et al.  Adaptive Background Modeling Integrated With Luminosity Sensors and Occlusion Processing for Reliable Vehicle Detection , 2011, IEEE Transactions on Intelligent Transportation Systems.

[5]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[6]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[8]  S. Nieuwenhuis,et al.  Mental fatigue and task control: planning and preparation. , 2000, Psychophysiology.

[9]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rita Cucchiara,et al.  Video Surveillance Online Repository (ViSOR): an integrated framework , 2010, Multimedia Tools and Applications.

[11]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[12]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[13]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[14]  B. Thomas,et al.  Usability Evaluation In Industry , 1996 .

[15]  Shaozi Li,et al.  Adaptive photograph retrieval method , 2012, Multimedia Tools and Applications.

[16]  Miguel A. Patricio,et al.  Interactive Video Annotation Tool , 2010, DCAI.

[17]  Robert Laganiere,et al.  OpenCV 2 Computer Vision Application Programming Cookbook , 2011 .

[18]  He Zhao,et al.  Traffic incident classification at intersections based on image sequences by HMM/SVM classifiers , 2010, Multimedia Tools and Applications.

[19]  Antonio Albiol,et al.  Detection of Parked Vehicles Using Spatiotemporal Maps , 2011, IEEE Transactions on Intelligent Transportation Systems.

[20]  Robert B. Fisher,et al.  The BEHAVE video dataset: ground truthed video for multi-person behavior classification , 2010 .