Deep Learning in Robotics: Survey on Model Structures and Training Strategies

The ever-increasing complexity of robot applications induces the need for methods to approach problems with no (viable) analytical solution. Deep learning (DL) provides a set of tools to address this kind of problems. This survey presents a categorization of the major challenges in robotics that leverage DL technologies and introduces representative examples of successful solutions for the described problems. We also consider the question when and whether to use modular, monolithic models or end-to-end DL, in order to provide a guideline for the selection of the correct model structure and training strategy. By doing so, the current role and adaptability of different techniques at different hierarchical levels of a robot-application can be highlighted, thus providing a well-structured basis to assist future approaches.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jangwon Lee,et al.  A survey of robot learning from demonstrations for Human-Robot Collaboration , 2017, ArXiv.

[3]  Jia Pan,et al.  Deep-Learned Collision Avoidance Policy for Distributed Multiagent Navigation , 2016, IEEE Robotics and Automation Letters.

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[6]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[7]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[8]  Antonio Bicchi,et al.  Robotic Grasping and Manipulation , 2001 .

[9]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[11]  Christopher G. Atkeson,et al.  Differential dynamic programming for graph-structured dynamical systems: Generalization of pouring behavior with different skills , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[12]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[13]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[14]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[15]  Frédéric Jurie,et al.  Multilevel Sensor Fusion With Deep Learning , 2018, IEEE Sensors Letters.

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Ming Liu,et al.  Deep-learning in Mobile Robotics - from Perception to Control Systems: A Survey on Why and Why not , 2016, ArXiv.

[18]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[20]  C. L. Philip Chen,et al.  Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[22]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[23]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[24]  Martin Jägersand,et al.  Incremental learning for robot perception through HRI , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[26]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Abdelghani Chibani,et al.  Deep HMResNet Model for Human Activity-Aware Robotic Systems , 2018, AAAI 2018.

[29]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[30]  Tao Zhang,et al.  Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.

[31]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[32]  C. Atkeson Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System , 2016 .

[33]  Y. Takefuji,et al.  Functional-link net computing: theory, system architecture, and functionalities , 1992, Computer.

[34]  Enzo Mumolo,et al.  Algorithms for acoustic localization based on microphone array in service robotics , 2003, Robotics Auton. Syst..

[35]  Wolfram Burgard,et al.  Deep Feature Learning for Acoustics-Based Terrain Classification , 2015, ISRR.

[36]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Kaushik Roy,et al.  Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning , 2018, ArXiv.

[40]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[41]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[42]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[43]  Smruti Amarjyoti Deep Reinforcement Learning for Robotic Manipulation - The state of the art , 2017, ArXiv.

[44]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[45]  Harry A. Pierson,et al.  Deep learning in robotics: a review of recent research , 2017, Adv. Robotics.

[46]  Michael C. Yip,et al.  Motion Planning Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[47]  C. L. Philip Chen,et al.  Broad learning system: A new learning paradigm and system without going deep , 2017, 2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC).

[48]  Artur Dubrawski,et al.  A method for tracking the pose of a mobile robot equipped with a scanning laser range finder , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[49]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[50]  Stephen James,et al.  3D Simulation for Robot Arm Control with Deep Q-Learning , 2016, ArXiv.

[51]  Ziyan Wu,et al.  DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition , 2017, 2017 International Conference on 3D Vision (3DV).

[52]  Xinyu Liu,et al.  Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[53]  Satoshi Uemura,et al.  Outdoor Acoustic Event Identification using Sound Source Separation and Deep Learning with a Quadrotor-Embedded Microphone Array , 2015 .

[54]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[55]  Tsuhan Chen,et al.  Deep Neural Network for Real-Time Autonomous Indoor Navigation , 2015, ArXiv.

[56]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[57]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[58]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[59]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[60]  A. Stephen McGough,et al.  Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[61]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[62]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[63]  Scott Kuindersma,et al.  An Architecture for Online Affordance‐based Perception and Whole‐body Planning , 2015, J. Field Robotics.

[64]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[65]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[66]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Fuzhen Zhuang,et al.  Supervised Representation Learning: Transfer Learning with Deep Autoencoders , 2015, IJCAI.

[68]  Shigeki Sugano,et al.  Tactile object recognition using deep learning and dropout , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[69]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[70]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[71]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[72]  Yevgen Chebotar,et al.  Learning Latent Space Dynamics for Tactile Servoing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[73]  Tomasz Malisiewicz,et al.  Toward Geometric Deep SLAM , 2017, ArXiv.

[74]  Manuela M. Veloso,et al.  Real-time randomized path planning for robot navigation , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[75]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[76]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[77]  Yang Gao,et al.  Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[78]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[79]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[80]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[81]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[82]  Christopher G. Atkeson,et al.  Optimization based full body control for the atlas robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[83]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[84]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[85]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[87]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[88]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[89]  Dejan J. Sobajic,et al.  Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[90]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[91]  Berat A. Erol,et al.  A deep vision landmark framework for robot navigation , 2017, 2017 12th System of Systems Engineering Conference (SoSE).

[92]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[93]  Siddhartha S. Srinivasa,et al.  Pose-constrained whole-body planning using Task Space Region Chains , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[94]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[95]  Soumya Ghosh,et al.  A segmentation guided label propagation scheme for autonomous navigation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[96]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[97]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[99]  Jeffrey C. Trinkle,et al.  Compressed Learning for Tactile Object Classification , 2016, ArXiv.

[100]  Russ Tedrake,et al.  Whole-body motion planning with centroidal dynamics and full kinematics , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[101]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[102]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[104]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[105]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[106]  Michael Happold,et al.  Image-based path planning for outdoor mobile robots , 2008, 2008 IEEE International Conference on Robotics and Automation.

[107]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[109]  Dilbag Singh,et al.  Multisensor Data Fusion and Integration for Mobile Robots: A Review , 2014, ICRA 2014.

[110]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[111]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[112]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[113]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114]  Elizabeth A. Croft,et al.  Jerk-bounded manipulator trajectory planning: design for real-time applications , 2003, IEEE Trans. Robotics Autom..