Optimistic planning for continuous-action deterministic systems

We consider the class of online planning algorithms for optimal control, which compared to dynamic programming are relatively unaffected by large state dimensionality. We introduce a novel planning algorithm called SOOP that works for deterministic systems with continuous states and actions. SOOP is the first method to explore the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. SOOP can be used parameter-free at the cost of more model calls, but we also propose a more practical variant tuned by a parameter α, which balances finer discretization with longer planning horizons. Experiments on three problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.

[1]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[4]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[5]  Sebastian Thrun,et al.  Planning for Markov Decision Processes with Sparse Stochasticity , 2004, NIPS.

[6]  Dimitri P. Bertsekas,et al.  Dynamic programming and optimal control, 3rd Edition , 2005 .

[7]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[8]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[9]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[10]  Rémi Munos,et al.  Optimistic Planning of Deterministic Systems , 2008, EWRL.

[11]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[12]  Michail G. Lagoudakis,et al.  Binary action search for learning continuous-action control policies , 2009, ICML '09.

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[14]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[15]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[16]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[17]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[19]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[20]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[21]  Jean-François Hren,et al.  Planification Optimiste pour Systèmes Déterministes , 2012 .

[22]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[23]  Lucian Busoniu,et al.  Optimistic planning for Markov decision processes , 2012, AISTATS.

[24]  Laurent Péret,et al.  Online Resolution Techniques , 2013 .

[25]  Jason Pazis,et al.  PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.