Risk-sensitive decision making via constrained expected returns

Decision making based on Markov decision processes (MDPs) is an emerging research area as MDPs provide a convenient formalism to learn an optimal behavior in terms of a given reward. In many applications there are critical states that might harm the agent or the environment and should therefore be avoided. In practice, those states are often simply penalized with a negative reward where the penalty is set in a trial-and-error approach. For this reason, we propose a modification of the well-known value iteration algorithm that guarantees that critical states are visited with a pre-set probability only. Since this leads to an infeasible problem, we investigate the effect of nonlinear and linear approximations and discuss the effects. Two examples demonstrate the effectiveness of the proposed approach.

[1]  Lei Zhang,et al.  An Adaptive Longitudinal Driving Assistance System Based on Driver Characteristics , 2013, IEEE Transactions on Intelligent Transportation Systems.

[2]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[3]  Kuk-Jin Yoon,et al.  Dynamic Point Clustering with Line Constraints for Moving Object Detection in DAS , 2014, IEEE Signal Processing Letters.

[4]  Raviraj S. Adve,et al.  Energy harvesting for relay-assisted communications , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  H. Vincent Poor,et al.  Multiagent Reinforcement Learning Based Spectrum Sensing Policies for Cognitive Radio Networks , 2013, IEEE Journal of Selected Topics in Signal Processing.

[7]  E. Altman Constrained Markov Decision Processes , 1999 .

[8]  Vivek S. Borkar,et al.  Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..

[9]  Thomas Brandmeier,et al.  Statistical Behavior Modeling for Driver-Adaptive Precrash Systems , 2013, IEEE Transactions on Intelligent Transportation Systems.

[10]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Mihaela van der Schaar,et al.  Fast Reinforcement Learning for Energy-Efficient Wireless Communication , 2010, IEEE Transactions on Signal Processing.

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  Seyedshams Feyzabadi,et al.  Risk-aware path planning using hirerachical constrained Markov Decision Processes , 2014, 2014 IEEE International Conference on Automation Science and Engineering (CASE).

[15]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[16]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[17]  Vikram Krishnamurthy,et al.  ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control , 2007, IEEE Transactions on Signal Processing.

[18]  Christoph Stiller,et al.  Velocity-Based Driver Intent Inference at Urban Intersections in the Presence of Preceding Vehicles , 2013, IEEE Intelligent Transportation Systems Magazine.

[19]  Subhrakanti Dey,et al.  A Constrained MDP Approach to Dynamic Quantizer Design for HMM State Estimation , 2009, IEEE Transactions on Signal Processing.

[20]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[21]  Terence D. Sanger,et al.  Risk-Aware Control , 2014, Neural Computation.

[22]  Jonghun Park,et al.  A Multiagent Approach to $Q$-Learning for Daily Stock Trading , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.