Vision Meets Drones: A Challenge

In this paper we present a large-scale visual object detection and tracking benchmark, named VisDrone2018, aiming at advancing visual understanding tasks on the drone platform. The images and video sequences in the benchmark were captured over various urban/suburban areas of 14 different cities across China from north to south. Specifically, VisDrone2018 consists of 263 video clips and 10,209 images (no overlap with video clips) with rich annotations, including object bounding boxes, object categories, occlusion, truncation ratios, etc. With intensive amount of effort, our benchmark has more than 2.5 million annotated instances in 179,264 images/video frames. Being the largest such dataset ever published, the benchmark enables extensive evaluation and investigation of visual analysis algorithms on the drone platform. In particular, we design four popular tasks with the benchmark, including object detection in images, object detection in videos, single object tracking, and multi-object tracking. All these tasks are extremely challenging in the proposed dataset due to factors such as occlusion, large scale and pose variation, and fast motion. We hope the benchmark largely boost the research and development in visual analysis on drone platforms.

[1]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  P. Fua,et al.  Detecting Flying Objects Using a Single Moving Camera , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Kaiqi Huang,et al.  An Equalized Global Graph Model-Based Approach for Multicamera Object Tracking , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Qingming Huang,et al.  Online Deformable Object Tracking Based on Structure-Aware Hyper-Graph , 2016, IEEE Transactions on Image Processing.

[7]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[10]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Zhenyu He,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results , 2016, ECCV Workshops.

[14]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[16]  Wesam A. Sakla,et al.  A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning , 2016, ECCV.

[17]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[18]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[22]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Dit-Yan Yeung,et al.  Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models , 2017, AAAI.

[24]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Erik Blasch,et al.  Encoding color information for visual tracking: Algorithms and benchmark , 2015, IEEE Transactions on Image Processing.

[26]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Martin Lauer,et al.  UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[28]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[29]  Ramakant Nevatia,et al.  Inter-camera Association of Multi-target Tracks by On-Line Learned Appearance Affinity Models , 2010, ECCV.

[30]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[31]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[32]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[35]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[37]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[38]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[39]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Yifan Wu,et al.  Planar Object Tracking in the Wild: A Benchmark , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[43]  Amit K. Roy-Chowdhury,et al.  A Camera Network Tracking (CamNeT) Dataset and Performance Baseline , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[44]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[45]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.