论文信息 - The K Best-Paths Approach to Approximate Dynamic Programming with Application to Portfolio Optimization

The K Best-Paths Approach to Approximate Dynamic Programming with Application to Portfolio Optimization

We describe a general method to transform a non-markovian sequential decision problem into a supervised learning problem using a K-best-paths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental results using a kernel-based controller architecture that would not normally be considered in traditional reinforcement learning or approximate dynamic programming.

Nicolas Chapados | Yoshua Bengio | Yoshua Bengio | Nicolas Chapados

[1] F. Sortino,et al. Performance Measurement in a Downside Risk Framework , 1994 .

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] Nicolas Chapados,et al. Cost functions and model combination for VaR-based asset allocation using neural networks , 2001, IEEE Trans. Neural Networks.

[4] William N. Goetzmann,et al. Active Portfolio Management , 1999 .

[5] W. Sharpe. The Sharpe Ratio , 1994 .

[6] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8] J. Mossin. Optimal multiperiod portfolio policies , 1968 .

[9] Yoshua Bengio,et al. Using a Financial Training Criterion Rather than a Prediction Criterion , 1997, Int. J. Neural Syst..

[10] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[11] Andrés Marzal,et al. Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison , 1999, WAE.

[12] R. C. Merton,et al. Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case , 1969 .

[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14] R. Bellman. Dynamic programming. , 1957, Science.

[15] P. Samuelson. Lifetime Portfolio Selection by Dynamic Stochastic Programming , 1969 .

[16] John Langford,et al. Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[17] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[20] R. Grinold,et al. Active portfolio management : a quantitative approach for providing superior returns and controlling risk , 2000 .

[21] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[22] Nicolas Chapados,et al. Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[23] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .