Comparing apples and oranges: Off‐road pedestrian detection on the National Robotics Engineering Center agricultural person‐detection dataset

Person detection from vehicles has made rapid progress recently with the advent of multiple high-quality datasets of urban and highway driving, yet no large-scale benchmark is available for the same problem in off-road or agricultural environments. Here we present the National Robotics Engineering Center (NREC) Agricultural Person-Detection Dataset to spur research in these environments. It consists of labeled stereo video of people in orange and apple orchards taken from two perception platforms (a tractor and a pickup truck), along with vehicle position data from Real Time Kinetic (RTK) GPS. We define a benchmark on part of the dataset that combines a total of 76k labeled person images and 19k sampled person-free images. The dataset highlights several key challenges of the domain, including varying environment, substantial occlusion by vegetation, people in motion and in nonstandard poses, and people seen from a variety of distances; metadata are included to allow targeted evaluation of each of these effects. Finally, we present baseline detection performance results for three leading approaches from urban pedestrian detection and our own convolutional neural network approach that benefits from the incorporation of additional image context. We show that the success of existing approaches on urban data does not transfer directly to this domain.

[1]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[2]  Hyeseon Chae,et al.  Estimated rate of agricultural injury: the Korean Farmers’ Occupational Disease and Injury Survey , 2014, Annals of Occupational and Environmental Medicine.

[3]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Sanjiv Singh,et al.  A practical obstacle detection system for autonomous orchard vehicles , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thierry Peynot,et al.  The Marulan Data Sets: Multi-sensor Perception in a Natural Environment with Challenging Conditions , 2010, Int. J. Robotics Res..

[9]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Carl Wellington,et al.  People in the weeds: Pedestrian detection goes off-road , 2015, 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[11]  Ben Upcroft,et al.  From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision , 2015, FSR.

[12]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[13]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[14]  Vijay Subramanian,et al.  Development of machine vision and laser radar based autonomous vehicle guidance systems for citrus grove navigation , 2006 .

[15]  Liang Zhao,et al.  Stereo- and neural network-based pedestrian detection , 2000, IEEE Trans. Intell. Transp. Syst..

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  Dariu Gavrila,et al.  A new benchmark for stereo-based pedestrian detection , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[18]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[20]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[21]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[23]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[24]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[25]  Sanjiv Singh,et al.  Results with autonomous vehicles operating in specialty crops , 2012, 2012 IEEE International Conference on Robotics and Automation.

[26]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  John R. Myers,et al.  Statistics and Epidemiology of Tractor Fatalities - a Historical Perspective , 1998 .

[28]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[29]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Michio Kise,et al.  PVS: A system for large scale outdoor perception performance evaluation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Qi Wang,et al.  Automated Crop Yield Estimation for Apple Orchards , 2012, ISER.

[33]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[36]  Bradford W. Parkinson,et al.  Automatic Steering of Farm Vehicles Using GPS , 2015 .

[37]  Ingmar Posner,et al.  Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks , 2016, AAAI.

[38]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.