Risk-averse reinforcement learning for algorithmic trading

We propose a general framework of risk-averse reinforcement learning for algorithmic trading. Our approach is tested in an experiment based on 1.5 years of millisecond time-scale limit order data from NASDAQ, which contain the data around the 2010 flash crash. The results show that our algorithm outperforms the risk-neutral reinforcement learning algorithm by 1) keeping the trading cost at a substantially low level at the spot when the flash crash happened, and 2) significantly reducing the risk over the whole test period.