Facilitating testing and debugging of Markov Decision Processes with interactive visualization

Researchers in AI and Operations Research employ the framework of Markov Decision Processes (MDPs) to formalize problems of sequential decision making under uncertainty. A common approach is to implement a simulator of the stochastic dynamics of the MDP and a Monte Carlo optimization algorithm that invokes this simulator to solve the MDP. The resulting software system is often realized by integrating several systems and functions that are collectively subject to failures of specification, implementation, integration, and optimization. We present these failures as queries for a computational steering visual analytic system (MDPVIS). MDPVIS addresses three visualization research gaps. First, the data acquisition gap is addressed through a general simulator-visualization interface. Second, the data analysis gap is addressed through a generalized MDP information visualization. Finally, the cognition gap is addressed by exposing model components to the user. MDPVIS generalizes a visualization for wildfire management. We use that problem to illustrate MDPVIS.

[1]  Bertjan Broeksema,et al.  Decision Exploration Lab: A Visual Analytics Solution for Decision Management , 2013, IEEE Transactions on Visualization and Computer Graphics.

[2]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[3]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[4]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[5]  Valerio Pascucci,et al.  Ensemble-Vis: A Framework for the Statistical Visualization of Ensemble Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[6]  Tamara Munzner,et al.  A Multi-Level Typology of Abstract Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[7]  Ronald Peikert,et al.  Multiverse Data-Flow Control , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  Jarke J. van Wijk,et al.  A survey of computational steering environments , 1999, Future Gener. Comput. Syst..

[9]  Tamara Munzner,et al.  Vismon: Facilitating Analysis of Trade‐Offs, Uncertainty, and Sensitivity In Fisheries Management Decision Making , 2012, Comput. Graph. Forum.

[10]  Thomas G. Dietterich,et al.  Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Alex Groce,et al.  You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems , 2014, IEEE Transactions on Software Engineering.

[13]  Eduard Gröller,et al.  World Lines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[14]  Ian D. Watson,et al.  Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[16]  Denis Gracanin,et al.  Interactive Visual Steering - Rapid Visual Prototyping of a Common Rail Injection System , 2008, IEEE Transactions on Visualization and Computer Graphics.

[17]  Eduard Gröller,et al.  Visual Analysis and Steering of Flooding Simulations , 2013, IEEE Transactions on Visualization and Computer Graphics.

[18]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[19]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[20]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Eduard Gröller,et al.  Nodes on Ropes: A Comprehensive Data and Control Flow for Steering Ensemble Simulations , 2011, IEEE Transactions on Visualization and Computer Graphics.

[23]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[24]  Zsolt Horváth,et al.  Many Plans: Multidimensional Ensembles for Visual Decision Support in Flood Management , 2014, Comput. Graph. Forum.

[25]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[26]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[27]  Guy H. Walker,et al.  Human Factors Methods: A Practical Guide for Engineering and Design , 2012 .

[28]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[29]  David S. Ebert,et al.  Visual analytics decision support environment for epidemic modeling and response evaluation , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[30]  Stefan Bruckner,et al.  Visual Parameter Space Analysis: A Conceptual Framework , 2014, IEEE Transactions on Visualization and Computer Graphics.

[31]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[32]  Marcel Worring,et al.  Visual exploration of classification models for risk assessment , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[33]  Louis Wehenkel,et al.  Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..

[34]  David M. Beazley,et al.  Computational steering. Software systems and strategies , 1997 .

[35]  Thomas G. Dietterich,et al.  Facilitating testing and debugging of Markov Decision Processes with interactive visualization , 2015, VL/HCC.

[36]  Thomas G. Dietterich,et al.  PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs , 2013, AAAI.