论文信息 - ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans

ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans

We introduce ARES, an efficient approximation algorithm for generating optimal plans action sequences that take an initial state of a Markov Decision Process MDP to a state whose cost is below a specified convergence threshold. ARES uses Particle Swarm Optimization, with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, the length of the horizon and the number of particles are chosen such that at least one particle reaches a next-level state, that is, a state where the cost decreases by a required delta from the previous-level state. The level relation on states and the plans constructed by ARES implicitly define a Lyapunov function and an optimal policy, respectively, both of which could be explicitly generated by applying ARES to all states of the MDP, up to some topological equivalence relation. We also assess the effectiveness of ARES by statistically evaluating its rate of success in generating optimal plans. The ARES algorithm resulted from our desire to clarify if flying in V-formation is a flocking policy that optimizes energy conservation, clear view, and velocity alignment. That is, we were interested to see if one could find optimal plans that bring a flock from an arbitrary initial state to a state exhibiting a single connected V-formation. For flocks with 7 birds, ARES is able to generate a plan that leads to a V-formation in 95% of the 8,000 random initial configurations within 63i¾?s, on average. ARES can also be easily customized into a model-predictive controller MPC with an adaptive receding horizon and statistical guarantees of convergence. To the best of our knowledge, our adaptive-sizing approach is the first to provide convergence guarantees in receding-horizon techniques.

[1] Michael S. Selig,et al. The aerodynamic benefits of self-organization in bird flocks , 2003 .

[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[3] Valmir Carneiro Barbosa,et al. V-like Formations in Flocks of Artificial Birds , 2006, Artificial Life.

[4] Bernard Chazelle,et al. The Convergence of Bird Flocking , 2009, JACM.

[5] Paul Glasserman,et al. Multilevel Splitting for Estimating Rare Event Probabilities , 1999, Oper. Res..

[6] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[7] Cédric Pralet,et al. Synthesis of plans or policies for controlling dynamic systems , 2012 .

[8] C. R. Ramakrishnan,et al. Using Statistical Model Checking for Measuring Systems , 2014, ISoLA.

[9] Anh Duc Dang,et al. Formation control of autonomous robots following desired formation during tracking a moving target , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[10] Weichung Wang,et al. Accelerating parallel particle swarm optimization via GPU , 2012, Optim. Methods Softw..

[11] Cyrille Jégourel,et al. Feedback Control for Statistical Model Checking of Cyber-Physical Systems , 2016, ISoLA.

[12] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..

[13] Bin Wu,et al. Fast Particle Filters and Their Applications to Adaptive Control in Change-Point ARX Models and Robotics , 2009 .

[14] Ying Tan,et al. GPU-based parallel particle swarm optimization , 2009, 2009 IEEE Congress on Evolutionary Computation.

[15] Thomas Hérault,et al. Approximate Probabilistic Model Checking , 2004, VMCAI.

[16] Ashish Tiwari,et al. Love Thy Neighbor: V-Formation as a Problem of Model Predictive Control , 2016, CONCUR.

[17] F. Cérou,et al. Adaptive Multilevel Splitting for Rare Event Analysis , 2007 .

[18] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[19] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.

[20] Iztok Lebar Bajec,et al. Organized flight in birds , 2009, Animal Behaviour.

[21] P. Lissaman,et al. Formation Flight of Birds , 1970, Science.

[22] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[23] Gary William Flake,et al. The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems and Adaptation , 1998 .

[24] Marko Bacic,et al. Model predictive control , 2003 .

[25] F. Heppner. Avian Flight Formations , 1974 .

[26] F. Vasca,et al. Formation Control and Collision Avoidance in Mobile Agent Systems , 2005, Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation Intelligent Control, 2005..

[27] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[28] Cutts,et al. ENERGY SAVINGS IN FORMATION FLIGHT OF PINK-FOOTED GEESE , 1994, The Journal of experimental biology.

[29] Olivier Sigaud,et al. Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[30] W ReynoldsCraig. Flocks, herds and schools: A distributed behavioral model , 1987 .

[31] Ezio Bartocci,et al. Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent , 2016, QEST.

[32] Bogdan Kwolek,et al. GPU-Accelerated Human Motion Tracking Using Particle Filter Combined with PSO , 2013, ACIVS.

[33] H. Weimerskirch,et al. Energy saving in flight formation , 2001, Nature.

[34] Peter J Seiler,et al. Analysis of bird formations , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[35] Edmund M. Clarke,et al. Statistical Model Checking for Markov Decision Processes , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[36] Craig W. Reynolds. Flocks, herds, and schools: a distributed behavioral model , 1998 .

[37] Ali H. Sayed,et al. Modeling Bird Flight Formations Using Diffusion Adaptation , 2011, IEEE Transactions on Signal Processing.