The limits and potentials of deep learning for robotics

The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and helps to fulfill the promising potentials of deep learning in robotics.

[1]  J. Piaget The construction of reality in the child , 1954 .

[2]  Jaakko Hintikka,et al.  On the Logic of Perception , 1969 .

[3]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[4]  J. Moran,et al.  Sensation and perception , 1980 .

[5]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[6]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[7]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[8]  M J Tarr,et al.  What Object Attributes Determine Canonical Views? , 1999, Perception.

[9]  S. Embretson,et al.  Item response theory for psychologists , 2000 .

[10]  D. Povinelli Folk physics for apes : the chimpanzee's theory of how the world works , 2003 .

[11]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[12]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[13]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[14]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[15]  Leif H. Finkel,et al.  A Neural Implementation of the Kalman Filter , 2009, NIPS.

[16]  Claes von Hofsten,et al.  Occlusion Is Hard: Comparing Predictive Reaching for Visible and Hidden Objects in Infants and Adults , 2009, Cogn. Sci..

[17]  Thomas L. Griffiths,et al.  Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling , 2009, NIPS.

[18]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Peter Stone,et al.  Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[20]  R. Baillargeon,et al.  How Do Infants Reason about Physical Events , 2010 .

[21]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[22]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[23]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[25]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[26]  Martin Humenberger,et al.  Neural Information Processing , 2012, Lecture Notes in Computer Science.

[27]  Audrey K. Kittredge,et al.  Object Individuation and Physical Reasoning in Infancy: An Integrative Account , 2012, Language learning and development : the official journal of the Society for Language Development.

[28]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Andreas Schierwagen,et al.  On reverse engineering in the cognitive and brain sciences , 2012, Natural Computing.

[30]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[31]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[35]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[36]  George J. Pappas,et al.  Nonmyopic View Planning for Active Object Classification and Pose Estimation , 2014, IEEE Transactions on Robotics.

[37]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[38]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Walter J. Scheirer,et al.  Perceptual Annotation: Measuring Human Vision to Improve Computer Vision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Thomas L. Dean,et al.  Neural Networks and Neuroscience-Inspired Computer Vision , 2014, Current Biology.

[41]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[44]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[45]  Terrance E. Boult,et al.  Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zoltan-Csaba Marton,et al.  Depth-based tracking with physical constraints for robot manipulation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[49]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[50]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[52]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[53]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[54]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[56]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[57]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[58]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[59]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[60]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[62]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[64]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[65]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[66]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[67]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[68]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[69]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[70]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[71]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[72]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[73]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[74]  Roland Siegwart,et al.  Receding Horizon "Next-Best-View" Planner for 3D Exploration , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[75]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[76]  Stephen James,et al.  3D Simulation for Robot Arm Control with Deep Q-Learning , 2016, ArXiv.

[77]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[78]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[79]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[80]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[82]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[83]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[85]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[86]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[88]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[89]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[90]  Scott Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[91]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[92]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[93]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[94]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[95]  Gordon Wyeth,et al.  Place categorization and semantic mapping on a mobile robot , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[96]  Oliver Brock,et al.  Unsupervised Learning of State Representations for Multiple Tasks , 2017 .

[97]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[98]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[99]  Marc Toussaint,et al.  Physical problem solving: Joint planning with symbolic, geometric, and dynamic constraints , 2017, CogSci.

[100]  Gabriela Csurka,et al.  Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[101]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[102]  Vishal M. Patel,et al.  Sparse Representation-Based Open Set Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[104]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control , 2017, ArXiv.

[105]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[107]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[108]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[109]  Michael Milford,et al.  Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[110]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[111]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[112]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[114]  Yinda Zhang,et al.  DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[115]  Michael Beetz,et al.  Envisioning the qualitative effects of robot manipulation actions using simulation-based projections , 2017, Artif. Intell..

[116]  Oliver Brock,et al.  End-to-End Learnable Histogram Filters , 2017 .

[117]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[118]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[119]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[120]  Simon Lucey,et al.  Rethinking Reprojection: Closing the Loop for Pose-Aware Shape Reconstruction from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[121]  Peter I. Corke,et al.  Episode-Based Active Learning with Bayesian Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[122]  Martin A. Riedmiller,et al.  PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations , 2017, ArXiv.

[123]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[124]  Garrison W. Cottrell,et al.  Deep active object recognition by joint label and action prediction , 2017, Comput. Vis. Image Underst..

[125]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[126]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[127]  Niko Sünderhauf,et al.  Dropout Sampling for Robust Object Detection in Open-Set Conditions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[128]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[129]  Walter J. Scheirer,et al.  PsyPhy: A Psychophysics Driven Evaluation Framework for Visual Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.