Learning with proxy supervision for end-to-end visual learning

Learning with deep neural networks forms the state-of-the-art in many tasks such as image classification, image detection, speech recognition, text analysis. We here set out to gain understanding in learning in an ‘end-to-end’ manner for an autonomous vehicle, which refers to directly learning the decision which will result from the perception of the scene. For example, we consider learning a binary ‘stop’/‘go’ decision, with respect to pedestrians, given the input image. In this work we propose to use additional information, referred to as ‘proxy supervision’, for improved learning and study its effects on the overall performance. We show that the proxy labels significantly improve the robustness of learning, while achieving as good, or better, accuracy than in the original task of binary classification.

[1]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[2]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[3]  Christoph Stiller,et al.  Map-based long term motion prediction for vehicles in traffic environments , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[4]  Mattias Bengtsson,et al.  Collision Warning with Full Auto Brake and Pedestrian Detection - a practical example of Automatic Emergency Braking , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Marc Pollefeys,et al.  Semantic Stixels: Depth is not enough , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[8]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[9]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[10]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[16]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Dariu Gavrila,et al.  Will the Pedestrian Cross? Probabilistic Path Prediction Based on Learned Motion Features , 2011, DAGM-Symposium.

[18]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[19]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).