Reward Adjustment Reinforcement Learning for Risk-averse Asset Allocation

Over the past decade, application of reinforcement learning (RL) in asset allocation and portfolio management has attracted much attention. However, most classical RL algorithms do not take risk into account, which may lead to treacherous trading decisions. In this paper, we propose a risk-averse RL method, named reward adjustment reinforcement learning. Our method incorporates risk to the classical RL framework by adjusting the reward with a risk penalty obtained from the GARCH model. This approach is generally easy in implementation and analysis when compared with other risk-averse models. Analysis is given to reveal the connection between our method and existing risk-averse RL methods. Experiment results on artificial data and real data in Hong Kong stock market are provided to compare the performances of our method and risk-sensitive RL algorithm and to illustrate the superiority of our method on generalization performance.

[1]  L. Chan Bidirectional Reinforcement Learning for Asset Allocation , 2022 .

[2]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[5]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[6]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[7]  J. Moody,et al.  Performance functions and reinforcement learning for trading systems and portfolios , 1998 .

[8]  M A H Dempster,et al.  An automated FX trading system using adaptive reinforcement learning , 2006, Expert Syst. Appl..

[9]  Lizhong Wu,et al.  Optimization of trading systems and portfolios , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Makoto Sato,et al.  TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .

[12]  Rosario N. Mantegna,et al.  Book Review: An Introduction to Econophysics, Correlations, and Complexity in Finance, N. Rosario, H. Mantegna, and H. E. Stanley, Cambridge University Press, Cambridge, 2000. , 2000 .

[13]  Ralph Neuneier,et al.  Optimal Asset Allocation using Adaptive Dynamic Programming , 1995, NIPS.

[14]  Byoung-Tak Zhang,et al.  Dynamic Asset Allocation Exploiting Predictors in Reinforcement Learning Framework , 2004, ECML.

[15]  Ralph Neuneier,et al.  Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.

[16]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[17]  Carl Gold,et al.  FX trading via recurrent reinforcement learning , 2003, 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings..

[18]  Michael S Lo Generalized Autoregressive Conditional Heteroscedastic Time Series Models , 2003 .

[19]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[20]  Makoto Sato,et al.  Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation , 2000, IDEAL.