Multi-Objective Markov Decision Processes for Data-Driven Decision Support

We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted-Q iteration for multiple objectives that computes policies for all scalarization functions, i.e. preference functions, simultaneously from continuous-state, finite-horizon data. We identify and address several conceptual and computational challenges along the way, and we introduce a new solution concept that is appropriate when different actions have similar expected outcomes. Finally, we demonstrate an application of our method using data from the Clinical Antipsychotic Trials of Intervention Effectiveness and show that our approach offers decision-makers increased choice by a larger class of optimal policies.

[1]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[3]  Robert West,et al.  Moderators and mediators of a web-based computer-tailored smoking cessation program among nicotine patch users. , 2006, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[4]  Eric B. Laber,et al.  Dynamic treatment regimes: Technical challenges and applications , 2014 .

[5]  T. Stroup,et al.  Assessing Clinical and Functional Outcomes in the Clinical Antipsychotic Trials of Intervention Effectiveness (catie) Schizophrenia Trial Send Reprint Requests to Clinical Outcome Measures: Primary Outcome Clinical and Functional Outcomes Table 1. Catie Schizophrenia Trial Centers' Clinical and Func , 2022 .

[6]  Shai Ben-David,et al.  Understanding Machine Learning: Preface , 2014 .

[7]  Joelle Pineau,et al.  A Study of Off-policy Learning in Computational Sustainability , 2012 .

[8]  J. Lieberman,et al.  The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: schizophrenia trial design and protocol development. , 2003, Schizophrenia bulletin.

[9]  Matthias Ehrgott,et al.  Multicriteria Optimization (2. ed.) , 2005 .

[10]  Thomas L. Griffiths,et al.  Faster Teaching by POMDP Planning , 2011, AIED.

[11]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Susan A. Murphy,et al.  A-Learning for approximate planning , 2004 .

[14]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[15]  Marcello Restelli,et al.  Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[16]  Susan A. Murphy,et al.  Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..

[17]  Joelle Pineau,et al.  Non-Deterministic Policies in Markovian Decision Processes , 2014, J. Artif. Intell. Res..

[18]  Susan A. Murphy,et al.  Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.

[19]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[20]  Oguzhan Alagoz,et al.  Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[21]  Oguzhan Alagoz,et al.  What Is the Optimal Threshold at Which to Recommend Breast Biopsy? , 2012, PloS one.

[22]  Patrice Perny,et al.  On Finding Compromise Solutions in Multiobjective Markov Decision Processes , 2010, ECAI.

[23]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[24]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[25]  Andrei V. Kelarev,et al.  Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[26]  Daniel J. Lizotte,et al.  Set‐valued dynamic treatment regimes for competing outcomes , 2012, Biometrics.

[27]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[28]  R. Kirk Practical Significance: A Concept Whose Time Has Come , 1996 .

[29]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[30]  S. Kay,et al.  The positive and negative syndrome scale (PANSS) for schizophrenia. , 1987, Schizophrenia bulletin.

[31]  Matthias Ehrgott,et al.  Multicriteria Optimization , 2005 .

[32]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[33]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[34]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[35]  Stuart J. Russell,et al.  Partially Observable Sequential Decision Making for Problem Selection in an Intelligent Tutoring System , 2011, EDM.

[36]  Kurt VanLehn,et al.  An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System: A Reinforcement Learning Approach , 2011, Int. J. Artif. Intell. Educ..

[37]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[38]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[39]  M. Heo,et al.  Antipsychotic-induced weight gain: a comprehensive research synthesis. , 1999, The American journal of psychiatry.