A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution

Reinforcement learning is explored as a candidate machine learning technique to enhance existing analytical solutions for optimal trade execution with elements from the market microstructure. Given a volume-to-trade, fixed time horizon and discrete trading periods, the aim is to adapt a given volume trajectory such that it is dynamic with respect to favourable/unfavourable conditions during realtime execution, thereby improving overall cost of trading. We consider the standard Almgren-Chriss model with linear price impact as a candidate base model. This model is popular amongst sell-side institutions as a basis for arrival price benchmark execution algorithms. By training a learning agent to modify a volume trajectory based on the market's prevailing spread and volume dynamics, we are able to improve post-trade implementation shortfall by up to 10.3% on average compared to the base model, based on a sample of stocks and trade sizes in the South African equity market.

[1]  Robert W. Holthausen,et al.  Large-block transactions, the speed of response, and temporary and permanent stock-price effects , 1990 .

[2]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[3]  James McCulloch,et al.  Relative volume as a doubly stochastic binomial point process , 2004 .

[4]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[5]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[6]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[7]  Andre F. Perold,et al.  The implementation shortfall , 1988 .

[8]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[9]  Louis K.C. Chan,et al.  The Behavior of Stock Prices Around Institutional Trades , 1993 .

[10]  Frédérick Garcia,et al.  A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon , 1998, ICML.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  William A. Brock,et al.  Periodic market closure and trading volume: A model of intraday bids and asks☆ , 1992 .

[15]  Frank de Jong,et al.  Aggressive Orders and the Resiliency of a Limit Order Market , 2005 .

[16]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[17]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[18]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[19]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Robert Almgren,et al.  Optimal execution with nonlinear impact functions and trading-enhanced risk , 2003 .

[21]  Michael Kearns,et al.  Reinforcement learning for optimized trade execution , 2006, ICML.

[22]  Alexander Fadeev,et al.  Optimal execution for portfolio transactions , 2006 .

[23]  Anat R. Admati,et al.  A Theory of Intraday Patterns: Volume and Price Variability , 1988 .

[24]  Gur Huberman,et al.  Optimal Liquidity Trading , 2000 .

[25]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26]  Dimitri Vayanos,et al.  Strategic trading in a dynamic noisy market , 2001 .

[27]  D. Bertsimas,et al.  Optimal control of execution costs , 1998 .