Lexicographic refinements in stationary possibilistic Markov Decision Processes

Abstract Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present 1 paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined by Fargier and Sabbadin [13] for non-sequential decision problems. We propose a value iteration algorithm and a policy iteration algorithm to compute policies that are optimal for these new criteria. The practical feasibility of these algorithms is then experimented on different samples of possibilistic MDPs.

[1]  J. Schreiber Foundations Of Statistics , 2016 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Jérôme Lang,et al.  Towards qualitative approaches to multi-stage decision making , 1998, Int. J. Approx. Reason..

[5]  Nahla Ben Amor,et al.  Lexicographic Refinements in Possibilistic Decision Trees , 2016, ECAI.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[8]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[9]  Yan Xu,et al.  Optimizing Quantiles in Preference-Based Markov Decision Processes , 2016, AAAI.

[10]  Régis Sabbadin Towards possibilistic reinforcement learning algorithms , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[11]  Jean-Loup Farges,et al.  Qualitative Possibilistic Mixed-Observable MDPs , 2013, UAI.

[12]  Didier Dubois,et al.  Qualitative Decision Theory with Sugeno Integrals , 1998, UAI.

[13]  Susana Montes,et al.  Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance , 2014, Eur. J. Oper. Res..

[14]  Hélène Fargier,et al.  Qualitative Decision under Uncertainty: Back to Expected Utility , 2003, IJCAI.

[15]  Didier Dubois,et al.  Decision-theoretic foundations of qualitative possibility theory , 2001, Eur. J. Oper. Res..

[16]  Régis Sabbadin,et al.  Possibilistic Markov decision processes , 2001 .

[17]  Eyke Hüllermeier,et al.  Qualitative Multi-Armed Bandits: A Quantile-Based Approach , 2015, ICML.

[18]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[19]  Paul Weng,et al.  Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences , 2011, ICAPS.

[20]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[21]  Arie Tzvieli Possibility theory: An approach to computerized processing of uncertainty , 1990, J. Am. Soc. Inf. Sci..

[22]  Weiru Liu,et al.  Anytime Algorithms for Solving Possibilistic MDPs and Hybrid MDPs , 2016, FoIKS.

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Shivaram Kalyanakrishnan,et al.  Improved Strong Worst-case Upper Bounds for MDP Planning , 2017, IJCAI.

[25]  Paul Weng,et al.  Quantile Reinforcement Learning , 2016, ArXiv.

[26]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[27]  R. Bellman A Markovian Decision Process , 1957 .

[28]  H. Moulin Axioms of Cooperative Decision Making , 1988 .

[29]  Didier Dubois,et al.  Possibility Theory as a Basis for Qualitative Decision Theory , 1995, IJCAI.

[30]  Nahla Ben Amor,et al.  Efficient Policies for Stationary Possibilistic Markov Decision Processes , 2017, ECSQARU.

[31]  Thomas Whalen,et al.  Decisionmaking under uncertainty with various assumptions about available information , 1984, IEEE Transactions on Systems, Man, and Cybernetics.