Event-Based Optimization for POMDPs and Its Application in Portfolio Management

Abstract Partially observable Markov decision processes(POMDPs) provide a framework for the optimization of Markov systems when there exist multiple sources of uncertainty: besides the system's stochastic dynamics, there are also observation noises. While POMDPs have many real applications, existing approaches to searching global optimal policy is computationally intractable even for systems with small sizes. In this paper, we apply the idea of the recently developed event-based optimization approach to study POMDP problems with infinite horizon setting. Based on this approach, a perturbation analysis based algorithm can be proposed to search for a local optimal policy. Further more, under a certain condition, a policy iteration type algorithm can be developed. We find that such a condition is satisfied for some partially observable systems in the financial engineering area. As an example, we discuss a portfolio management problem at the end of the paper.

[1]  A. Stuart,et al.  Portfolio Selection: Efficient Diversification of Investments , 1959 .

[2]  A. Stuart,et al.  Portfolio Selection: Efficient Diversification of Investments. , 1960 .

[3]  C. Striebel Sufficient statistics in the optimum control of stochastic systems , 1965 .

[4]  R. C. Merton,et al.  Optimum consumption and portfolio rules in a continuous - time model Journal of Economic Theory 3 , 1971 .

[5]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[6]  R. C. Merton,et al.  Optimum Consumption and Portfolio Rules in a Continuous-Time Model* , 1975 .

[7]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  S. Heston A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options , 1993 .

[10]  Chelsea C. White,et al.  Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..

[11]  X. Zhou,et al.  Continuous-Time Mean-Variance Portfolio Selection: A Stochastic LQ Framework , 2000 .

[12]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13]  Agnès Sulem,et al.  Dynamic Optimization of Long-Term Growth Rate for a Portfolio with Transaction Costs and Logarithmic Utility , 2001 .

[14]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[15]  Jonathan Baxter,et al.  Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[16]  S. Peng,et al.  Risk-Sinsitive Dynamic Portfolio Optimization with Partial Information on Infinite Time Horizon , 2002 .

[17]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[18]  Andrew E. B. Lim Quadratic Hedging and Mean-Variance Portfolio Selection with Random Parameters in an Incomplete Market , 2004, Math. Oper. Res..

[19]  Xi-Ren Cao,et al.  Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..

[20]  Jun Liu Portfolio Selection in Stochastic Environments , 2007 .

[21]  Xi-Ren Cao,et al.  Event-Based Optimization of Markov Systems , 2008, IEEE Transactions on Automatic Control.

[22]  Michael C. Fu,et al.  A numerical method for financial decision problems under stochastic volatility , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[23]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[24]  Michael C. Fu,et al.  Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.

[25]  Tao Lu,et al.  Stochastic control via direct comparison , 2011, Discret. Event Dyn. Syst..