Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation

The tasks of optimizing asset allocation considering transaction costs can be formulated into the framework of Markov Decision Processes (MDPs) and reinforcement learning. In this paper, a risk-averse reinforcement learning algorithm is proposed which improves asset allocation strategy of portfolio management systems. The proposed algorithm alternates policy evaluation phases which take into account the mean and variance of return under a given policy and policy improvement phases which follow the variance-penalized criterion. The algorithm is tested on trading systems for a single future corresponding to a Japanese stock index.