Portfolio Choices with Orthogonal Bandit Learning

The investigation and development of new methods from diverse perspectives to shed light on portfolio choice problems has never stagnated in financial research. Recently, multi-armed bandits have drawn intensive attention in various machine learning applications in online settings. The tradeoff between exploration and exploitation to maximize rewards in bandit algorithms naturally establishes a connection to portfolio choice problems. In this paper, we present a bandit algorithm for conducting online portfolio choices by effectually exploiting correlations among multiple arms. Through constructing orthogonal portfolios from multiple assets and integrating with the upper confidence bound bandit framework, we derive the optimal portfolio strategy that represents the combination of passive and active investments according to a risk-adjusted reward function. Compared with oft-quoted trading strategies in finance and machine learning fields across representative real-world market datasets, the proposed algorithm demonstrates superiority in both risk-adjusted return and cumulative wealth.

[1]  A. Meucci Risk and asset allocation , 2005 .

[2]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[3]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[4]  William N. Goetzmann,et al.  Active Portfolio Management , 1999 .

[5]  E. Thorp Portfolio Choice and the Kelly Criterion , 1975 .

[6]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[7]  R. Jagannathan,et al.  Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps , 2002 .

[8]  E. Fama,et al.  The Cross‐Section of Expected Stock Returns , 1992 .

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[11]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[12]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[13]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[14]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[15]  Olivier Ledoit,et al.  Robust Performance Hypothesis Testing with the Sharpe Ratio , 2007 .

[16]  H. Konno,et al.  Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market , 1991 .

[17]  Jennifer Wortman Vaughan,et al.  Risk-Sensitive Online Learning , 2006, ALT.

[18]  Jack L. Treynor,et al.  MUTUAL FUND PERFORMANCE* , 2007 .

[19]  Allan Borodin,et al.  Can We Learn to Beat the Best Stock , 2003, NIPS.

[20]  Robert M. Dammon,et al.  Taxes and Investment Choice , 2012 .

[21]  Jun Wang,et al.  Doubly Regularized Portfolio with Risk Minimization , 2014, AAAI.

[22]  Alessandro Lazaric,et al.  Risk-Aversion in Multi-armed Bandits , 2012, NIPS.

[23]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[24]  William H Press,et al.  Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research , 2009, Proceedings of the National Academy of Sciences.

[25]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[26]  Michael W. Brandt Portfolio Choice Problems , 2010 .

[27]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[28]  Thorsten Joachims,et al.  Multi-armed Bandit Problems with History , 2012, AISTATS.

[29]  G. Lugosi,et al.  NONPARAMETRIC KERNEL‐BASED SEQUENTIAL INVESTMENT STRATEGIES , 2006 .

[30]  Victor DeMiguel,et al.  Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? , 2009 .

[31]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[32]  Bin Li,et al.  On-Line Portfolio Selection with Moving Average Reversion , 2012, ICML.

[33]  Jun Wang,et al.  Transaction Costs-Aware Portfolio Optimization via Fast Lowner-John Ellipsoid Approximation , 2015, AAAI.

[34]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[35]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[36]  Jürgen Schmidhuber,et al.  Algorithm portfolio selection as a bandit problem with unbounded losses , 2011, Annals of Mathematics and Artificial Intelligence.

[37]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[38]  Adam Tauman Kalai,et al.  Universal Portfolios With and Without Transaction Costs , 2004, Machine Learning.

[39]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[40]  Raman Uppal,et al.  A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms , 2009, Manag. Sci..

[41]  Mark Broadie,et al.  Computing efficient frontiers using estimated parameters , 1993, Ann. Oper. Res..

[42]  Odalric-Ambrym Maillard,et al.  Robust Risk-Averse Stochastic Multi-armed Bandits , 2013, ALT.

[43]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..