Noisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization

We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-best-paths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental results using a kernel-based controller architecture that would not normally be considered in traditional reinforcement learning or approximate dynamic programming. We further show that using a non-additive criterion (incremental Sharpe Ratio) yields a noisy K-best-paths extraction problem, that can give substantially improved performance.

[1]  F. Sortino,et al.  Performance Measurement in a Downside Risk Framework , 1994 .

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  J. Mossin Optimal multiperiod portfolio policies , 1968 .

[4]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5]  Yoshua Bengio,et al.  Using a Financial Training Criterion Rather than a Prediction Criterion , 1997, Int. J. Neural Syst..

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Nicolas Chapados,et al.  Cost functions and model combination for VaR-based asset allocation using neural networks , 2001, IEEE Trans. Neural Networks.

[8]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[9]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[10]  P. Samuelson LIFETIME PORTFOLIO SELECTION BY DYNAMIC STOCHASTIC PROGRAMMING , 1969 .

[11]  John Langford,et al.  Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[12]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[13]  R. C. Merton,et al.  Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case , 1969 .

[14]  William N. Goetzmann,et al.  Active Portfolio Management , 1999 .

[15]  W. Sharpe The Sharpe Ratio , 1994 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Andrés Marzal,et al.  Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison , 1999, WAE.

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[21]  Jack L. Treynor,et al.  MUTUAL FUND PERFORMANCE* , 2007 .

[22]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1995 .

[23]  R. Bellman Dynamic programming. , 1957, Science.