Efficient reinforcement learning using Gaussian processes

This book examines Gaussian processes in both model-based reinforcement learning (RL) and inference in nonlinear dynamic systems. First, we introduce PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available. PILCO takes model uncertainties consistently into account during long-term planning to reduce model bias. Second, we propose principled algorithms for robust filtering and smoothing in GP dynamic systems.

[1]  D. Fraser,et al.  The optimum linear smoother as a combination of two optimum linear filters , 1969 .

[2]  H. Sorenson,et al.  Nonlinear Bayesian estimation using Gaussian sum approximations , 1972 .

[3]  G. Matheron The intrinsic random functions and their applications , 1973, Advances in Applied Probability.

[4]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[5]  Temple F. Smith Occam's razor , 1980, Nature.

[6]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Raymond A. DeCarlo,et al.  Continuation methods: Theory and applications , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[9]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[10]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[14]  I. Verdinelli,et al.  Bayesian designs for maximizing information and outcome , 1992 .

[15]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[16]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[17]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[18]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[19]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[20]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[21]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[22]  Mark W. Spong,et al.  The Pendubot: a mechatronic system for control research and education , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[23]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[24]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[25]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[26]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[27]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[28]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[29]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[30]  S. Julier,et al.  A General Method for Approximating Nonlinear Transformations of Probability Distributions , 1996 .

[31]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[32]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[33]  Daniel M. Wolpert,et al.  Forward Models for Physiological Motor Control , 1996, Neural Networks.

[34]  Christopher K. I. Williams Regression with Gaussian processes , 1997 .

[35]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[36]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[37]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[38]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[39]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[40]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[41]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[42]  Dong Xiang,et al.  The Bias-Variance Tradeoff and the Randomized GACV , 1998, NIPS.

[43]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[44]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[45]  Shigenobu Kobayashi,et al.  Efficient Non-Linear Control by Combining Q-learning with Local Linear Controllers , 1999, ICML.

[46]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[47]  Yoav Naveh,et al.  Nonlinear Modeling and Control of a Unicycle , 1999 .

[48]  Thomas P. Minka,et al.  From Hidden Markov Models to Linear Dynamical Systems , 1999 .

[49]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[50]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[51]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[52]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[53]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[54]  Hugh F. Durrant-Whyte,et al.  A new method for the nonlinear transformation of means and covariances in filters and estimators , 2000, IEEE Trans. Autom. Control..

[55]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[56]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[57]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[58]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[59]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[60]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[61]  Wei Zhong,et al.  Energy and passivity based control of the double inverted pendulum on a cart , 2001, Proceedings of the 2001 IEEE International Conference on Control Applications (CCA'01) (Cat. No.01CH37204).

[62]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[63]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[64]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[65]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[66]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[67]  Daniel Sbarbaro,et al.  Nonlinear adaptive control using non-parametric Gaussian Process prior models , 2002 .

[68]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[69]  Juha Karhunen,et al.  An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models , 2002, Neural Computation.

[70]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[71]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[72]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[73]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[74]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[75]  Roderick Murray-Smith,et al.  Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction , 2002 .

[76]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[77]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[78]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[79]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[80]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[81]  J. Kocijan,et al.  Predictive control with Gaussian process models , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[82]  Agathe Girard,et al.  Adaptive, Cautious, Predictive control with Gaussian Process Priors , 2003 .

[83]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[84]  Agathe Girard,et al.  Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines Application to Multiple-Step Ahead Time-Series Forecasting , 2002 .

[85]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[86]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[87]  Li-Chen Fu,et al.  Passivity based control of the double inverted pendulum driven by a linear induction motor , 2003, Proceedings of 2003 IEEE Conference on Control Applications, 2003. CCA 2003..

[88]  A. Pacut,et al.  Model-free off-policy reinforcement learning in continuous environment , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[89]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[90]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[91]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[92]  Alexander Y. Bogdanov,et al.  Optimal Control of a Double Inverted Pendulum on a Cart , 2004 .

[93]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[94]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[95]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[96]  O. Zoeter,et al.  Improved unscented kalman smoothing for stock volatility estimation , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[97]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[98]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[100]  Konrad Paul Kording,et al.  Bayesian integration in sensorimotor learning , 2004, Nature.

[101]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[102]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[103]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[104]  Tom Heskes,et al.  Gaussian Quadrature Based Expectation Propagation , 2005, AISTATS.

[105]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[106]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[107]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[108]  Carl E. Rasmussen,et al.  Assessing Approximations for Gaussian Process Classification , 2005, NIPS.

[109]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[110]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[111]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[112]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[113]  Joris De Schutter,et al.  Nonlinear Kalman Filtering for Force-Controlled Robot Tasks , 2010, Springer Tracts in Advanced Robotics.

[114]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[115]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[116]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[117]  Tom Heskes,et al.  Novel approximations for inference in nonlinear dynamical systems using expectation propagation , 2005, Neurocomputing.

[118]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[119]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[120]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[121]  T. Raiko,et al.  Learning nonlinear state-space models for control , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[122]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[123]  A.G. Alleyne,et al.  A survey of iterative learning control , 2006, IEEE Control Systems.

[124]  Manfred Opper,et al.  A Bayesian Approach to Online Learning , 2006 .

[125]  Konrad Paul Kording,et al.  Review TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Bayesian decision theory in sensorimotor control , 2022 .

[126]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[127]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[128]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[129]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[130]  Larry Wasserman,et al.  All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .

[131]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[132]  T. Heskes,et al.  Deterministic and Stochastic Gaussian Particle Smoothing , 2006, 2006 IEEE Nonlinear Statistical Signal Processing Workshop.

[133]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[134]  Malte Kuß,et al.  Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .

[135]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[136]  Daniel M Wolpert,et al.  Computational principles of sensorimotor control that minimize uncertainty and variability , 2007, The Journal of physiology.

[137]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[138]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[139]  Rowland O'Flaherty,et al.  Robust Global Swing-Up of the Pendubot via Hybrid Control , 2007 .

[140]  Dieter Fox,et al.  GP-UKF: Unscented kalman filters with Gaussian process prediction and observation models , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[141]  A. Grancharova,et al.  Explicit stochastic Nonlinear Predictive Control based on Gaussian process models , 2007, 2007 European Control Conference (ECC).

[142]  Fabian Kappeler Unicycle Robot , 2007 .

[143]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[144]  Knut Graichen,et al.  Swing-up of the double pendulum on a cart by feedforward and feedback control with experimental validation , 2007, Autom..

[145]  Marc Toussaint,et al.  Bayesian inference for motion control and planning , 2007 .

[146]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[147]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[148]  Simo Särkkä,et al.  Unscented Rauch-Tung-Striebel Smoother , 2008, IEEE Trans. Autom. Control..

[149]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[150]  Pascal Poupart,et al.  Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[151]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[152]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[153]  Carl E. Rasmussen,et al.  Model-Based Reinforcement Learning with Continuous States and Actions , 2008, ESANN.

[154]  Bojan Likar,et al.  Gas-liquid separator modelling and simulation with Gaussian-process models , 2008, Simul. Model. Pract. Theory.

[155]  Mazen Alamir,et al.  Swing-up and stabilization of a Twin-Pendulum under state and control constraints by a fast NMPC scheme , 2008, Autom..

[156]  Tor Arne Johansen,et al.  Explicit stochastic predictive control of combustion plants based on Gaussian process models , 2008, Autom..

[157]  Bernhard Schölkopf,et al.  Sparse multiscale gaussian process regression , 2008, ICML '08.

[158]  Iain Murray,et al.  Introduction to Gaussian Processes , 2008 .

[159]  Leonardo Acho,et al.  Robust Orbital Stabilization of Pendubot: Algorithm Synthesis, Experimental Verification, and Application to Swing up and Balancing Control , 2008 .

[160]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[161]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[162]  Uwe D. Hanebeck,et al.  Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[163]  Shalabh Bhatnagar,et al.  Natural actorcritic algorithms. , 2009 .

[164]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[165]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[166]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[167]  Tapani Raiko,et al.  Variational Bayesian learning of nonlinear hidden state-space models for model predictive control , 2009, Neurocomputing.

[168]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[169]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[170]  S. Haykin,et al.  Cubature Kalman Filters , 2009, IEEE Transactions on Automatic Control.

[171]  Warren B. Powell,et al.  An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..

[172]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[173]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[174]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[175]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[176]  Henrik Ohlsson,et al.  A Probabilistic Perspective on Gaussian Filtering and Smoothing , 2010, ArXiv.

[177]  Simo Särkkä,et al.  On Gaussian Optimal Smoothing of Non-Linear State Space Models , 2010, IEEE Transactions on Automatic Control.

[178]  Carl E. Rasmussen,et al.  State-Space Inference and Learning with Gaussian Processes , 2010, AISTATS.

[179]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[180]  Sebastian Thrun,et al.  A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving , 2010, 2010 IEEE International Conference on Robotics and Automation.

[181]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[182]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.

[183]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[184]  Carl E. Rasmussen,et al.  Model based learning of sigma points in unscented Kalman filtering , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[185]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[186]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[187]  Peter S. Maybeck,et al.  Stochastic Models, Estimation And Control , 2012 .