Automatic Ground Truths: Projected Image Annotations for Omnidirectional Vision

We present a novel data set made up of omnidirectional video of multiple objects whose centroid positions are annotated automatically. Omnidirectional vision is an active field of research focused on the use of spherical imagery in video analysis and scene understanding, involving tasks such as object detection, tracking and recognition. Our goal is to provide a large and consistently annotated video data set that can be used to train and evaluate new algorithms for these tasks. Here we describe the experimental setup and software environment used to capture and map the 3D ground truth positions of multiple objects into the image. Furthermore, we estimate the expected systematic error on the mapped positions. In addition to final data products, we release publicly the software tools and raw data necessary to re-calibrate the camera and/or redo this mapping. The software also provides a simple framework for comparing the results of standard image annotation tools or visual tracking systems against our mapped ground truth annotations.

[1]  Pascal Vasseur,et al.  Omnidirectional image processing using geodesic metric , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[2]  John Hannah,et al.  IEEE International Conference on Image Processing (ICIP) , 1997 .

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[5]  Davide Scaramuzza,et al.  Benefit of large field-of-view cameras for visual odometry , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  E. Malis,et al.  Spherical Image Processing for Accurate Visual Odometry with Omnidirectional Cameras , 2008 .

[7]  Deepak Khosla,et al.  Performance Evaluation of Neuromorphic-Vision Object Recognition Algorithms , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Masatoshi Okutomi,et al.  Robust Feature Matching for Distorted Projection by Spherical Cameras , 2015, IPSJ Trans. Comput. Vis. Appl..

[9]  Liang Shi,et al.  3D Reconstruction from Full-view Fisheye Camera , 2015, ArXiv.

[10]  Ezio Malis,et al.  Visual tracking of planes with an uncalibrated central catadioptric camera , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Yalin Bastanlar,et al.  A Direct approach for human detection with catadioptric omnidirectional cameras , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[12]  Thomas C. Brusgard Distributed-aperture infrared sensor systems , 1999, Defense, Security, and Sensing.

[13]  Konrad Schindler,et al.  Challenges of Ground Truth Evaluation of Multi-target Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Roland Siegwart,et al.  A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from Motion , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[15]  Zhiyong Xu,et al.  A Low-Cost Implementation of a 360° Vision Distributed Aperture System , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Hakki Can Karaimer,et al.  Car detection with omnidirectional cameras using Haar-like features and cascaded boosting , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[17]  Hairong Qi,et al.  Real Time Multi-vehicle Tracking and Counting at Intersections from a Fisheye Camera , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[18]  Albert Ali Salah,et al.  Feature-based tracking on a multi-omnidirectional camera dataset , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[19]  Yasemin Yardimci,et al.  Multi-view structure-from-motion for hybrid camera scenarios , 2012, Image Vis. Comput..

[20]  Hakki Can Karaimer,et al.  Detection and Classification of Vehicles from Omnidirectional Videos using Temporal Average of Silhouettes , 2015, VISAPP.

[21]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Avishek Chakraborty,et al.  A data set for evaluating the performance of multi-class multi-object video tracking , 2017, Defense + Security.

[24]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ming-Hsuan Yang,et al.  DETRAC: A New Benchmark and Protocol for Multi-Object Tracking , 2015, ArXiv.

[26]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[27]  Yalin Bastanlar,et al.  A direct approach for object detection with catadioptric omnidirectional cameras , 2016, Signal Image Video Process..

[28]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[29]  Michael Felsberg,et al.  A Multi-sensor Traffic Scene Dataset with Omnidirectional Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Jonathan Chan,et al.  Computer applications , 1986 .