Performance functions and reinforcement learning for trading systems and portfolios

We propose to train trading systems and portfolios by optimizing objective functions that directly measure trading and investment performance. Rather than basing a trading system on forecasts or training via a supervised learning algorithm using labelled trading data, we train our systems using recurrent reinforcement learning (RRL) algorithms. The performance functions that we consider for reinforcement learning are profit or wealth, economic utility, the Sharpe ratio and our proposed differential Sharpe ratio. The trading and portfolio management systems require prior decisions as input in order to properly take into account the effects of transactions costs, market impact, and taxes. This temporal dependence on system state requires the use of reinforcement versions of standard recurrent learning algorithms. We present empirical results in controlled experiments that demonstrate the efficacy of some of our methods for optimizing trading systems and portfolios. For a long/short trader, we find that maximizing the differential Sharpe ratio yields more consistent results than maximizing profits, and that both methods outperform a trading system based on forecasts that minimize MSE. We find that portfolio traders trained to maximize the differential Sharpe ratio achieve better risk-adjusted returns than those trained to maximize profit. Finally, we provide simulation results for an S&P 500/TBill asset allocation system that demonstrate the presence of out-of-sample predictability in the monthly S&P 500 stock index for the 25 year period 1970 through 1994. Copyright © 1998 John Wiley & Sons, Ltd.

[1]  D. Bernoulli Exposition of a New Theory on the Measurement of Risk , 1954 .

[2]  A. Stuart,et al.  Portfolio Selection: Efficient Diversification of Investments , 1959 .

[3]  A. Stuart,et al.  Portfolio Selection: Efficient Diversification of Investments. , 1960 .

[4]  J. M. Bates,et al.  The Combination of Forecasts , 1969 .

[5]  Fischer Black,et al.  How to Use Security Analysis to Improve Portfolio Selection , 1973 .

[6]  Stephen Taylor,et al.  Forecasting Economic Time Series , 1979 .

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[9]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[10]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[11]  P. Samuelson Asset allocation could be dangerous to your health , 1990 .

[12]  David N. Nawrocki Optimal algorithms and lower partial moment: ex post results , 1991 .

[13]  F. Sortino,et al.  DOWNSIDE RISK - CAPTURING WHATS AT STAKE IN INVESTMENT SITUATIONS , 1991 .

[14]  David N. Nawrocki The characteristics of portfolios selected by n-degree Lower Partial Moment , 1992 .

[15]  Richard C. Grinold,et al.  Active Portfolio Management: Quantitative Theory and Applications , 1995 .

[16]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[17]  S. Satchell,et al.  An assessment of the economic value of non‐linear foreign exchange rate forecasts , 1995 .

[18]  A. Timmermann,et al.  Predictability of Stock Returns: Robustness and Economic Significance , 1995 .

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  Ralph Neuneier,et al.  Optimal Asset Allocation using Adaptive Dynamic Programming , 1995, NIPS.

[21]  F. Sortino,et al.  On the Use and Misuse of Downside Risk , 1996 .

[22]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[23]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[24]  Lizhong Wu,et al.  Optimization of trading systems and portfolios , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[25]  John Moody,et al.  Reinforcement Learning for Trading Systems and Portfolios: Immediate vs Future Rewards , 1998 .