Perception-based generalization in model-based reinforcement learning

In recent years, the advances in robotics have allowed for robots to venture into places too dangerous for humans. Unfortunately, the terrain in which these robots are being deployed may not be known by humans in advance, making it difficult to create motion programs robust enough to handle all scenarios that the robot may encounter. For this reason, research is being done to add learning capabilities to improve the robot's ability to adapt to its environment. Reinforcement learning is well suited for these robot domains because often the desired outcome is known, but the best way to achieve this outcome is unknown. In a real world domain, a reinforcement-learning agent has to learn a great deal from experience. Therefore, it must be sample-size efficient. To do so, it must balance the amount of exploration that is needed to properly model the environment with the need to use the information that it has already obtained to complete its original task. In robot domains, the exploration process is especially costly in both time and energy. Therefore, it is important to make the best possible use of the robot's limited opportunities for exploration without degrading the robot's performance. This dissertation discusses a specialization of the standard Markov Decision Process (MDP) framework that allows for easier transfer of experience between similar states and introduces an algorithm that uses this new framework to perform more efficient exploration in robot-navigation problems. It then develops methods for an agent to determine how to accurately group similar states. One proposed technique clusters states by their observed outcomes. To make it possible to extrapolate observed outcomes to as-yet unvisited states, a second approach uses perceptual information such as the output of an image-processing system to group perceptually similar states with the hope that they will also be related in terms of outcomes. However, there are many different percepts from which a robot could obtain state groupings. To address this issue, a third algorithm is presented that determines how to group states when the agent has multiple, possibly conflicting, inputs from which to choose. Robot experiments of all algorithms proposed are included to demonstrate the improvements that can be obtained by using the approaches presented.

[1]  L. Hubert Approximate Evaluation Techniques for the Single-Link and Complete-Link Hierarchical Clustering Procedures , 1974 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Robert L. Smith,et al.  Aggregation in Dynamic Programming , 1987, Oper. Res..

[4]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[5]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[6]  Charles E. Thorpe,et al.  UNSCARF-a color vision system for the detection of unstructured roads , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[7]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[8]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[9]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[10]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[13]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[14]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[17]  Philip W. L. Fong A Quantitative Study of Hypothesis Selection , 1995, ICML.

[18]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[19]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[22]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[23]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[25]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[26]  Clark F. Olson,et al.  Enhanced Mars rover navigation techniques , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[27]  Wolfram Burgard,et al.  Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..

[28]  J. Balaram Kinematic state estimation for a Mars rover , 2000, Robotica.

[29]  Manuela M. Veloso,et al.  Fast and inexpensive color image segmentation for interactive robots , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[30]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[31]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[32]  Peter Meer,et al.  Synergism in low level vision , 2002, Object recognition supported by user interaction for service robots.

[33]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[34]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[35]  Christopher Rasmussen,et al.  Combining laser range, color, and texture cues for autonomous road following , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[36]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[37]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[38]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[39]  Manuela M. Veloso,et al.  Fast and accurate vision-based pattern detection and identification , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[40]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[41]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[42]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[43]  Thomas J. Walsh,et al.  Efficient Exploration With Latent Structure , 2005, Robotics: Science and Systems.

[44]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Peter Stone,et al.  Simultaneous Calibration of Action and Sensor Models on a Mobile Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[47]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[48]  Andrew Wilson,et al.  Toward a Topological Theory of Relational Reinforcement Learning for Navigation Tasks , 2005, FLAIRS.

[49]  Michael L. Littman,et al.  A hierarchical approach to efficient reinforcement learning in deterministic domains , 2006, AAMAS '06.

[50]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[51]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[52]  Peter Stone,et al.  Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[53]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[54]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[55]  Nicholas Roy,et al.  CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.

[56]  Bethany R. Leffler,et al.  Efficient Learning of Dynamics Models using Terrain Classification , 2008 .