Fast Optimization of Wildfire Suppression Policies with SMAC

Managers of US National Forests must decide what policy to apply for dealing with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber companies, home owners, and wildlife biologists) have often led to spirited political debates and even violent eco-terrorism. One way to transform these conflicts into multi-stakeholder negotiations is to provide a high-fidelity simulation environment in which stakeholders can explore the space of alternative policies and understand the tradeoffs therein. Such an environment needs to support fast optimization of MDP policies so that users can adjust reward functions and analyze the resulting optimal policies. This paper assesses the suitability of SMAC---a black-box empirical function optimization algorithm---for rapid optimization of MDP policies. The paper describes five reward function components and four stakeholder constituencies. It then introduces a parameterized class of policies that can be easily understood by the stakeholders. SMAC is applied to find the optimal policy in this class for the reward functions of each of the stakeholder constituencies. The results confirm that SMAC is able to rapidly find good policies that make sense from the domain perspective. Because the full-fidelity forest fire simulator is far too expensive to support interactive optimization, SMAC is applied to a surrogate model constructed from a modest number of runs of the full-fidelity simulator. To check the quality of the SMAC-optimized policies, the policies are evaluated on the full-fidelity simulator. The results confirm that the surrogate values estimates are valid. This is the first successful optimization of wildfire management policies using a full-fidelity simulation. The same methodology should be applicable to other contentious natural resource management problems where high-fidelity simulation is extremely expensive.

[1]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[2]  Antanas Zilinskas,et al.  A review of statistical models for global optimization , 1992, J. Glob. Optim..

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[5]  M. Finney FARSITE : Fire Area Simulator : model development and evaluation , 1998 .

[6]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  G. E. Dixon Essential FVS: A User's Guide to the Forest Vegetation Simulator , 2007 .

[12]  Charles W. McHugh,et al.  Modeling Containment of Large Wildfires Using Generalized Linear Mixed-Model Analysis , 2009, Forest Science.

[13]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[14]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[15]  Thomas G. Dietterich,et al.  Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .

[16]  Louis Wehenkel,et al.  Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..

[17]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[18]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[19]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[20]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[21]  Thomas G. Dietterich,et al.  Facilitating testing and debugging of Markov Decision Processes with interactive visualization , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[22]  Thomas G. Dietterich,et al.  Fast Optimization of Wildfire Suppression Policies with SMAC , 2017, ArXiv.

[23]  Thomas G. Dietterich,et al.  Factoring Exogenous State for Model-Free Monte Carlo , 2017, ArXiv.