Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk

In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent extension of RL, called distributional RL (disRL), and introduce estimation efficiency, while properly adjusting for differential impact of outliers on the two terms of the RL prediction error in the updating equations. We show that the resulting “efficient distributional RL” (e-disRL) learns much faster, and is robust once it settles on a policy. Our paper also provides a brief, nontechnical overview of machine learning, focusing on RL.

[1]  Peter Dayan,et al.  Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.

[2]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.

[3]  Albert Corhay,et al.  STATISTICAL PROPERTIES OF DAILY RETURNS: EVIDENCE FROM EUROPEAN STOCK MARKETS , 1994 .

[4]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[5]  Svetlozar T. Rachev,et al.  Unconditional and Conditional Distributional Models for the Nikkei Index , 1998 .

[6]  P. Franses,et al.  A simple test for GARCH against a stochastic volatility , 2008 .

[7]  Qiang Sun,et al.  Adaptive Huber Regression , 2017, Journal of the American Statistical Association.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[10]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[11]  Cyril Billet,et al.  Emergent rogue wave structures and statistics in spontaneous modulation instability , 2015, Scientific Reports.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Thomas Serre,et al.  Models of visual cortex , 2013, Scholarpedia.

[14]  John C. Bogle The Index Mutual Fund: 40 Years of Growth, Change, and Challenge , 2016 .

[15]  Haim Shalit,et al.  Estimating stock market volatility using asymmetric GARCH models , 2008 .

[16]  T. Bollerslev,et al.  A CONDITIONALLY HETEROSKEDASTIC TIME SERIES MODEL FOR SPECULATIVE PRICES AND RATES OF RETURN , 1987 .

[17]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[18]  Svetlozar T. Rachev,et al.  Approximation of skewed and leptokurtic return distributions , 2012 .

[19]  Maciej Romaniuk,et al.  A fuzzy approach to option pricing in a Levy process setting , 2013, Int. J. Appl. Math. Comput. Sci..

[20]  José Dias Curto,et al.  Modeling stock markets’ volatility using GARCH models with Normal, Student’s t and stable Paretian distributions , 2009 .

[21]  P. Bossaerts,et al.  Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response , 2016, Cerebral cortex.

[22]  Efficient estimation of expected stock price returns , 2017 .