Driverseat: Crowdstrapping Learning Tasks for Autonomous Driving

While emerging deep-learning systems have outclassed knowledge-based approaches in many tasks, their application to detection tasks for autonomous technologies remains an open field for scientific exploration. Broadly, there are two major developmental bottlenecks: the unavailability of comprehensively labeled datasets and of expressive evaluation strategies. Approaches for labeling datasets have relied on intensive hand-engineering, and strategies for evaluating learning systems have been unable to identify failure-case scenarios. Human intelligence offers an untapped approach for breaking through these bottlenecks. This paper introduces Driverseat, a technology for embedding crowds around learning systems for autonomous driving. Driverseat utilizes crowd contributions for (a) collecting complex 3D labels and (b) tagging diverse scenarios for ready evaluation of learning systems. We demonstrate how Driverseat can crowdstrap a convolutional neural network on the lane-detection task. More generally, crowdstrapping introduces a valuable paradigm for any technology that can benefit from leveraging the powerful combination of human and computer intelligence.

[1]  Jon Froehlich,et al.  Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning , 2014, UIST.

[2]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[3]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[4]  YenikayaSibel,et al.  Keeping the vehicle on the road , 2013 .

[5]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[6]  Ronen Lerner,et al.  Recent progress in road and lane detection: a survey , 2012, Machine Vision and Applications.

[7]  Martin R. Gibbs,et al.  Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[8]  Monson H. Hayes,et al.  A Novel Lane Detection System With Efficient Ground Truth Generation , 2012, IEEE Transactions on Intelligent Transportation Systems.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[12]  Hsu-Yung Cheng,et al.  Lane Detection With Moving Vehicles in the Traffic Scenes , 2006, IEEE Transactions on Intelligent Transportation Systems.

[13]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[14]  Massimo Bertozzi,et al.  GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection , 1998, IEEE Trans. Image Process..

[15]  Peter Wittek,et al.  GPU Technology Conference , 2013 .

[16]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[17]  Tamitza Toroyan,et al.  Global status report on road safety , 2009, Injury Prevention.

[18]  Yann LeCun,et al.  Deep belief net learning in a long-range vision system for autonomous off-road driving , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[20]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[21]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[22]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[23]  Dinggang Shen,et al.  Lane detection and tracking using B-Snake , 2004, Image Vis. Comput..

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Honglak Lee,et al.  Automatic Single-Image 3d Reconstructions of Indoor Manhattan World Scenes , 2007, ISRR.

[26]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.