Interactive visualization for testing Markov Decision Processes: MDPVIS

Markov Decision Processes (MDPs) are a formulation for optimization problems in sequential decision making. Solving MDPs often requires implementing a simulator for optimization algorithms to invoke when updating decision making rules known as policies. The combination of simulator and optimizer are subject to failures of specification, implementation, integration, and optimization that may produce invalid policies. We present these failures as queries for a visual analytic system (MDPVIS). MDPVIS addresses three visualization research gaps. First, the data acquisition gap is addressed through a general simulator-visualization interface. Second, the data analysis gap is addressed through a generalized MDP information visualization. Finally, the cognition gap is addressed by exposing model components to the user. MDPVIS generalizes a visualization for wildfire management. We use that problem to illustrate MDPVIS and show the visualization's generality by connecting it to two reinforcement learning frameworks that implement many different MDPs of interest in the research community. HighlightsMarkov decision processes (MDPs) formalize sequential decision optimization problems.Complex simulators often implement MDPs and are subject to a variety of bugs.Interactive visualizations support testing MDPs and optimization algorithms.The first visualization targeting MDP testing, MDPvis, is presented.

[1]  David S. Ebert,et al.  Visual analytics decision support environment for epidemic modeling and response evaluation , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[4]  Valerio Pascucci,et al.  Ensemble-Vis: A Framework for the Statistical Visualization of Ensemble Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[5]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[6]  Alex Groce,et al.  You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems , 2014, IEEE Transactions on Software Engineering.

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[9]  Tamara Munzner,et al.  A Multi-Level Typology of Abstract Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[10]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[11]  Bertjan Broeksema,et al.  Decision Exploration Lab: A Visual Analytics Solution for Decision Management , 2013, IEEE Transactions on Visualization and Computer Graphics.

[12]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[13]  Thomas G. Dietterich,et al.  Facilitating testing and debugging of Markov Decision Processes with interactive visualization , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[14]  Tamara Munzner,et al.  Vismon: Facilitating Analysis of Trade‐Offs, Uncertainty, and Sensitivity In Fisheries Management Decision Making , 2012, Comput. Graph. Forum.

[15]  Eduard Gröller,et al.  Visual Analysis and Steering of Flooding Simulations , 2013, IEEE Transactions on Visualization and Computer Graphics.

[16]  Marcel Worring,et al.  Visual exploration of classification models for risk assessment , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[17]  P. W. Huang,et al.  (Journal of Visual Languages and Computing,19:637-651)Spatial Inference and Similarity Retrieval of an Image Database System Based on Object's Spanning Representation , 2007 .

[18]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[19]  Eduard Gröller,et al.  Nodes on Ropes: A Comprehensive Data and Control Flow for Steering Ensemble Simulations , 2011, IEEE Transactions on Visualization and Computer Graphics.

[20]  Zsolt Horváth,et al.  Many Plans: Multidimensional Ensembles for Visual Decision Support in Flood Management , 2014, Comput. Graph. Forum.

[21]  Thomas G. Dietterich,et al.  PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs , 2013, AAAI.

[22]  Alborz Geramifard,et al.  RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..

[23]  Eduard Gröller,et al.  World Lines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[24]  Ian D. Watson,et al.  Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[25]  Stefan Bruckner,et al.  Visual Parameter Space Analysis: A Conceptual Framework , 2014, IEEE Transactions on Visualization and Computer Graphics.

[26]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[27]  Thomas G. Dietterich,et al.  Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .

[28]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[29]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[30]  Ronald Peikert,et al.  Multiverse Data-Flow Control , 2013, IEEE Transactions on Visualization and Computer Graphics.

[31]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[32]  Andreas Zeller,et al.  Why Programs Fail, Second Edition: A Guide to Systematic Debugging , 2009 .

[33]  Guy H. Walker,et al.  Human Factors Methods: A Practical Guide for Engineering and Design , 2012 .

[34]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[36]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[37]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[38]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[39]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.