Risk Sensitive Reinforcement Learning Scheme Is Suitable for Learning on a Budget

Risk-sensitive reinforcement learning (Risk-sensitiveRL) has been studied by many researchers. The methods are based on a prospect method, which imitates the value function of a human. Although they are mainly intended at imitating human behaviors, there are fewer discussions about the engineering meaning of it. In this paper, we show that Risk-sensitiveRL is useful for using online-learning machines whose resources are limited. In such a learning method, a part of the learned memories should be removed to create space for recording a new important instance. The experimental results show that risk-sensitive RL is superior to normal RL. This might mean that the human brain is also constructed by a limited number of neurons, so that humans hire the risk-sensitive value function for the learning.

[1]  Koichiro Yamauchi,et al.  A Dynamic Pruning Strategy for Incremental Learning on a Budget , 2014, ICONIP.

[2]  Sang Lyul Min,et al.  LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies , 2001, IEEE Trans. Computers.

[3]  Yuri Suzuki,et al.  Co-learning system for humans and machines using a weighted majority-based method , 2016, Int. J. Hybrid Intell. Syst..

[4]  Koichiro Yamauchi,et al.  Incremental learning on a budget and its application to quick maximum power point tracking of photovoltaic systems , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[5]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[6]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[7]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Si Wu,et al.  A kernel-based Perceptron with dynamic memory , 2012, Neural Networks.

[9]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[10]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[11]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[12]  Koichiro Yamauchi,et al.  Quick MPPT microconverter using a limited general regression neural network with adaptive forgetting , 2015, 2015 International Conference on Sustainable Energy Engineering and Application (ICSEEA).

[13]  Koichiro Yamauchi,et al.  Acceleration of reinforcement learning via game-based renewal energy management system , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[14]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[15]  Koichiro Yamauchi,et al.  Pruning with replacement and automatic distance metric detection in limited general regression neural networks , 2011, The 2011 International Joint Conference on Neural Networks.

[16]  Frank Schweitzer,et al.  Risk-Seeking versus Risk-Avoiding Investments in Noisy Periodic Environments , 2008, ArXiv.

[17]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .