Markov decision processes with fuzzy rewards

In this paper, we consider the model that the information on the rewards in vector-valued Markov decision processes includes imprecision or ambiguity. The fuzzy reward model is analyzed as follows: The fuzzy reward is represented by the fuzzy set on the multi-dimensional Euclidian space R and the infinite horizon fuzzy expected discounted reward(FEDR) from any stationary policy is characterized as a unique fixed point of the corresponding contractive operator. Also, we fined a Pareto optimal policy which maximizes the infinite horizon FEDR over all stationary policies under the pseudo order induced by a convex cone R. As a numerical example, the machine maintenance problem is considered.

[1]  Yuji Yoshida,et al.  A fuzzy treatment of uncertain Markov decision processes : Average case (Mathematical Decision Making under uncertainty and ambiguity) , 2000 .

[2]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[3]  Yuji Yoshida A Time-Average Fuzzy Reward Criterion in Fuzzy Decision Processes , 1998, Inf. Sci..

[4]  N. Furukawa,et al.  Paramentric orders on fuzzy numbers and their roles in fuzzy optimization problems , 1997 .

[5]  J. Ramík,et al.  Inequality relation between fuzzy numbers and its use in fuzzy optimization , 1985 .

[6]  Masami Yasuda,et al.  Markov-Type Fuzzy Decision Processes with a Discounted Reward on a Closed Interval(Mathematical Structure of Optimization Theory) , 1994 .

[7]  Masanori Hosaka,et al.  CONTROLLED MARKOV SET-CHAINS WITH DISCOUNTING , 1998 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[10]  M. Kurano,et al.  Interval Methods for Uncertain Markov Decision Processes , 2002 .

[11]  P. Kloeden,et al.  Metric spaces of fuzzy sets , 1990 .

[12]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[13]  Nagata Furukawa,et al.  Characterization of Optimal Policies in Vector-Valued Markovian Decision Processes , 1980, Math. Oper. Res..

[14]  R. Aumann INTEGRALS OF SET-VALUED FUNCTIONS , 1965 .