Large-Scale Stochastic Scene Generation and Semantic Annotation for Deep Convolutional Neural Network Training in the RoboCup SPL

Object detection and classification are essential tasks for any robotics scenario, where data-driven approaches, specifically deep learning techniques, have been widely adopted in recent years. However, in the context of the RoboCup standard platform league these methods have not yet gained comparable popularity in large part due to the lack of (publicly) available large enough data sets that involve a tedious gathering and error-prone manual annotation process. We propose a framework for stochastic scene generation, rendering and automatic creation of semantically annotated ground truth masks. Used as training data in conjunction with deep convolutional neural networks we demonstrate compelling classification accuracy on real-world data in a multi-class setting. An evaluation on multiple neural network architectures with varying depth and representational capacity, corresponding run-times on current NAO-H25 hardware, and required sampled training data is provided.

[1]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[2]  Daniele Nardi,et al.  A Deep Learning Approach for Object Recognition with NAO Soccer Robots , 2016, RoboCup.

[3]  Brian Karis,et al.  Real Shading in Unreal Engine 4 by , 2013 .

[4]  Sven Behnke,et al.  Learning Visual Obstacle Detection Using Color Histogram Features , 2011, RoboCup.

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[7]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[8]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[9]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[13]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[14]  Daniel D. Lee,et al.  Adaptive Field Detection and Localization in Robot Soccer , 2016, RoboCup.

[15]  Oliver Urbann,et al.  A Robust and Calibration-Free Vision System for Humanoid Soccer Robots , 2015, RoboCup.

[16]  Ubbo Visser,et al.  Robust and Efficient Object Recognition for a Humanoid Soccer Robot , 2013, RoboCup.

[17]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[18]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[19]  Visvanathan Ramesh,et al.  Model-driven Simulations for Deep Convolutional Neural Networks , 2016, ArXiv.

[20]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stefan Wermter,et al.  Ball Localization for Robocup Soccer Using Convolutional Neural Networks , 2016, RoboCup.

[24]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.