Neural networks for incremental dimensionality reduced reinforcement learning

State-of-the-art personal robots must perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom. The complexity of these robots lead to large dimensional state spaces, which are difficult to fully explore. Our previous work introduced the IDRRL algorithm, which compresses the learning space by transforming a high-dimensional learning space onto a lower-dimensional manifold while preserving expressivity. In this work we formally prove that IDRRL maintains PAC-MDP guarantees. We then improve upon our previous formulation of IDRRL by introducing cascading autoencoders (CAE) for dimensionality reduction, producing the new algorithm IDRRL-CAE. We demonstrate the improvement of this extension over our previous formulation, IDRRL-PCA, in the Mountain Car and Swimmers domains.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[4]  Carme Torras,et al.  Dimensionality reduction for probabilistic movement primitives , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[5]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[6]  H. JoséAntonioMartín,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009, IWINAC.

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  Prasad Tadepalli,et al.  Learning Goal-Decomposition Rules Using Exercises , 1997, AAAI/IAAI.

[10]  David W. Aha,et al.  Dimensionality Reduced Reinforcement Learning for Assistive Robots , 2016, AAAI Fall Symposia.

[11]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[12]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[14]  Takahiro Suzuki,et al.  Casting Control for Hyper-Flexible Manipulation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[16]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[17]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[18]  Yoonsuck Choe,et al.  Directed Exploration in Reinforcement Learning with Transferred Knowledge , 2012, EWRL.

[19]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[20]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[23]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24]  Mykola Pechenizkiy,et al.  Feature Extraction for Classification in Knowledge Discovery Systems , 2003, KES.

[25]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[26]  Kagan Tumer,et al.  Addressing hard constraints in the air traffic problem through partitioning and difference rewards , 2013, AAMAS.

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Alexander Dietrich,et al.  Reactive Whole-Body Control: Dynamic Mobile Manipulation Using a Large Number of Actuated Degrees of Freedom , 2012, IEEE Robotics & Automation Magazine.

[29]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Pradeep K. Khosla,et al.  Real-time obstacle avoidance using harmonic potential functions , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[31]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[32]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[33]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.