Learning decisions: robustness, uncertainty, and approximation

Decision making under uncertainty is a central problem in robotics and machine learning. This thesis explores three fundamental and intertwined aspects of the problem of learning to make decisions. The first is the problem of uncertainty. Classical optimal control techniques typically rely on perfect state information. Real world problems never enjoy such conditions. Perhaps more critically, classical optimal control algorithms fail to degrade gracefully as this assumption is violated. Closely tied to the problem of uncertainty is that of approximation. In large scale problems, learning decisions inevitably requires approximation. The difficulties of approximation inside the framework of optimal control are well-known. Often, especially in robotics applications, we wish to operate learned controllers in domains where failure has relatively serious consequences. It is important to ensure that decision policies we generate are robust both to uncertainty in our models of systems and to our inability to accurately capture true system dynamics. We present new classes of algorithms that gracefully handle uncertainty, approximation, and robustness. We pay attention to the computational aspects of both the problems and algorithms developed. Finally, we provide case studies that serve as both motivation for the techniques as well as illustrate their applicability.

[1]  G. D. Liveing,et al.  The University of Cambridge , 1897, British medical journal.

[2]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[3]  M. Degroot Optimal Statistical Decisions , 1970 .

[4]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[5]  Suguru Arimoto,et al.  Bettering operation of Robots by learning , 1984, J. Field Robotics.

[6]  Peter C. Cheeseman,et al.  In Defense of Probability , 1985, IJCAI.

[7]  Peter W. Glynn,et al.  Proceedings of Ihe 1986 Winter Simulation , 2022 .

[8]  Editors , 1986, Brain Research Bulletin.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[13]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[16]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[19]  Andrew W. Moore,et al.  Memory-based Stochastic Optimization , 1995, NIPS.

[20]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[21]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[22]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[25]  W. Fleming,et al.  Risk-Sensitive Control of Finite State Machines on an Infinite Horizon I , 1997 .

[26]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[28]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[29]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments , 1999 .

[30]  Takeo Kanade,et al.  System identification of small-size unmanned helicopter dynamics , 1999 .

[31]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[32]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[33]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[34]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[35]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[36]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[37]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[38]  Peter W. Glynn,et al.  Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice , 2000, NIPS.

[39]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[40]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[43]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[44]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[45]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[46]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[47]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[48]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[49]  Christian R. Shelton,et al.  Importance sampling for reinforcement learning with multiple objectives , 2001 .

[50]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[51]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[52]  Benjamin Van Roy,et al.  Approximate Linear Programming for Average-Cost Dynamic Programming , 2002, NIPS.

[53]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[54]  William H. Press,et al.  Numerical recipes in C , 2002 .

[55]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[56]  Chris Urmson,et al.  A generic framework for robotic navigation , 2003, 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652).

[57]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[58]  D. Koller,et al.  Planning under uncertainty in complex structured environments , 2003 .

[59]  Jeff G. Schneider,et al.  Covariant policy search , 2003, IJCAI 2003.

[60]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[61]  J. Langford,et al.  Reducing T-step reinforcement learning to classifica-tion , 2003 .

[62]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[63]  Sebastian Thrun,et al.  Perspectives on standardization in mobile robot programming: the Carnegie Mellon Navigation (CARMEN) Toolkit , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[64]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[65]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[66]  Jan Peters,et al.  Reinforcement Learning for Humanoid Robots - Policy Gradients and Beyond , 2004 .

[67]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[68]  Volume 16 , 2004, Journal of Clinical Monitoring and Computing.

[69]  Sekhar Tatikonda,et al.  Control under communication constraints , 2004, IEEE Transactions on Automatic Control.

[70]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[71]  M. Littman,et al.  Exploration via Model-based Interval Estimation , 2004 .

[72]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[73]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[74]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[75]  Laurent El Ghaoui,et al.  Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices , 2005 .

[76]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.