Resource allocation with stochastic optimal control approach

A control-theoretic decision making system is proposed for an agent (decision maker) to “optimally” allocate and deploy his/her resources over time among a dynamically changing list of opportunities (e.g., financial assets), in an uncertain market environment. The solution is a sequence of actions with the objective of optimizing total reward function. This control-theoretic approach is unique in a sense that it solves the problem at distinct time epochs over a finite time horizon and strategies are discovered directly. Rather than basing a decision making system on forecasts or training via a reinforcement learning algorithm using current state data, we train our system via a Q-learning algorithm using Geometric Brownian Motion as an asset price function. While the above problem is quite general, we focus solely on the problem of dynamic financial portfolio management with the objective of maximizing the expected utility for a given risk level. The performance functions that we consider for our system are realized mean return, drawdown and standard deviation. We find that our model achieves a better return and drawdown compared to a known market index as a benchmark.

[1]  P. Samuelson Asset allocation could be dangerous to your health , 1990 .

[2]  Jun Liu Portfolio Selection in Stochastic Environments , 2007 .

[3]  Lai-Wan Chan,et al.  An Algorithm for Trading and Portfolio Management Using Q-learning and Sharpe Ratio Maximization , 2000 .

[4]  Abraham Lioui,et al.  On optimal portfolio choice under stochastic interest rates , 2001 .

[5]  Süleyman Özekici,et al.  Portfolio selection with imperfect information: A hidden Markov model , 2011 .

[6]  P. Samuelson LIFETIME PORTFOLIO SELECTION BY DYNAMIC STOCHASTIC PROGRAMMING , 1969 .

[7]  Patrícia Xufre Casqueiro,et al.  Neuro-dynamic trading methods , 2006, Eur. J. Oper. Res..

[8]  Hercules Vladimirou,et al.  A dynamic stochastic programming model for international portfolio management , 2008, Eur. J. Oper. Res..

[9]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[10]  Paul A. Samuelson,et al.  The judgment of economic science on rational portfolio management , 1989 .

[11]  Jessica A. Wachter Portfolio and Consumption Decisions under Mean-Reverting Returns: An Exact Solution for Complete Markets , 2001, Journal of Financial and Quantitative Analysis.

[12]  Alberto Suárez,et al.  Dynamic portfolio management with transaction costs , 2009 .

[13]  Jae Won Lee,et al.  A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems , 2002, DEXA.

[14]  E. Fama Multiperiod Consumption-Investment Decisions , 1970 .

[15]  Byoung-Tak Zhang,et al.  Adaptive stock trading with dynamic asset allocation using reinforcement learning , 2006, Inf. Sci..

[16]  R. C. Merton,et al.  Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case , 1969 .

[17]  R. C. Merton,et al.  Continuous-Time Finance , 1990 .

[18]  James E. Smith,et al.  Dynamic Portfolio Optimization with Transaction Costs: Heuristics and Dual Bounds , 2011, Manag. Sci..

[19]  Ralph Neuneier,et al.  Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.

[20]  X. Zhou,et al.  Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[21]  Colin Atkinson,et al.  Dynamic Portfolio Optimization in Discrete-Time with Transaction Costs , 2012 .

[22]  Yonggan Zhao,et al.  Dynamic investment models with downside risk control , 2000 .

[23]  Davy Janssens,et al.  Integrating Bayesian networks and decision trees in a sequential rule-based transportation model , 2006, Eur. J. Oper. Res..