The K Best-Paths Approach to Approximate Dynamic Programming with Application to Portfolio Optimization

We describe a general method to transform a non-markovian sequential decision problem into a supervised learning problem using a K-best-paths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental results using a kernel-based controller architecture that would not normally be considered in traditional reinforcement learning or approximate dynamic programming.

[1]  F. Sortino,et al.  Performance Measurement in a Downside Risk Framework , 1994 .

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Nicolas Chapados,et al.  Cost functions and model combination for VaR-based asset allocation using neural networks , 2001, IEEE Trans. Neural Networks.

[4]  William N. Goetzmann,et al.  Active Portfolio Management , 1999 .

[5]  W. Sharpe The Sharpe Ratio , 1994 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  J. Mossin Optimal multiperiod portfolio policies , 1968 .

[9]  Yoshua Bengio,et al.  Using a Financial Training Criterion Rather than a Prediction Criterion , 1997, Int. J. Neural Syst..

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Andrés Marzal,et al.  Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison , 1999, WAE.

[12]  R. C. Merton,et al.  Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case , 1969 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  R. Bellman Dynamic programming. , 1957, Science.

[15]  P. Samuelson Lifetime Portfolio Selection by Dynamic Stochastic Programming , 1969 .

[16]  John Langford,et al.  Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[17]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[20]  R. Grinold,et al.  Active portfolio management : a quantitative approach for providing superior returns and controlling risk , 2000 .

[21]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[22]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[23]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .