Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

A general-purpose intelligent robot must be able to learn autonomously and be able to accomplish multiple tasks in order to be deployed in the real world. However, standard reinforcement learning approaches learn separate task-specific policies and assume the reward function for each task is known a priori. We propose a framework that learns event cues from off-policy data, and can flexibly combine these event cues at test time to accomplish different tasks. These event cue labels are not assumed to be known a priori, but are instead labeled using learned models, such as computer vision detectors, and then `backed up' in time using an action-conditioned predictive model. We show that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks. Videos of the experiments and code can be found at this https URL

[1]  Markus Wulfmeier,et al.  Watch this: Scalable cost-function learning for path planning in urban environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[3]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[4]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[5]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[6]  Sergey Levine,et al.  Deep Object-Centric Representations for Generalizable Robot Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[8]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[10]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[11]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[13]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[14]  Jennifer Curtis,et al.  What Happens if ... , 2011 .

[15]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[16]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.

[17]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[18]  Marko Bacic,et al.  Model predictive control , 2003 .

[19]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[20]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Takeo Kanade,et al.  Vision and Navigation for the Carnegie-Mellon Navlab , 1987 .

[22]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Nils J. Nilsson,et al.  APPLICATION OF INTELLIGENT AUTOMATA TO RECONNAISSANCE. , 1967 .

[24]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[25]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[26]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[28]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[30]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[31]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[32]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[34]  Ingmar Posner,et al.  Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[36]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[37]  Charles Richter,et al.  Safe Visual Navigation via Deep Learning and Novelty Detection , 2017, Robotics: Science and Systems.

[38]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[41]  David Q. Mayne,et al.  Model predictive control: Recent developments and future promise , 2014, Autom..