Markowitz minimum variance portfolio optimization using new machine learning methods

The use of improved covariance matrix estimators as an alternative to the sample covariance is considered an important approach for enhancing portfolio optimization. In this thesis, we propose the use of sparse inverse covariance estimation for Markowitz minimum variance portfolio optimization, using existing methodology known as Graphical Lasso [16], which is an algorithm used to estimate the inverse covariance matrix from observations from a multivariate Gaussian distribution. We begin by benchmarking Graphical Lasso, showing the importance of regularization to control sparsity. Experimental results show that Graphical Lasso has a tendency to overestimate the diagonal elements of the estimated inverse covariance matrix as the regularization increases. To remedy this, we introduce a new method of setting the optimal regularization which shows performance that is at least as good as the original method by [16]. Next, we show the application of Graphical Lasso in a bioinformatics gene microarray tissue classification problem where we have a large number of genes relative to the number of samples. We perform dimensionality reduction by estimating graphical Gaussian models using Graphical Lasso, and using gene group average expression levels as opposed to individual expression levels to classify samples. We compare classification performance with the sample covariance, and show that the sample covariance performs better. Finally, we use Graphical Lasso in combination with validation techniques that optimize portfolio criteria (risk, return etc.) and Gaussian likelihood to generate new portfolio strategies to be used for portfolio optimization with and without short selling constraints. We compare performance on synthetic and real stock market data with existing covariance estimators in literature, and show that the newly developed portfolio strategies perform well, although performance of all methods depend on the ratio between the estimation period and number of stocks, and on the presence or absence of short selling constraints.

[1]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[2]  J. Newton,et al.  Analysis of Microarray Gene Expression Data Using Machine Learning Techniques , 2002 .

[3]  Raman Uppal,et al.  A Generalized Approach to Portfolio Optimization: Improving Performance by Constraining Portfolio Norms , 2009, Manag. Sci..

[4]  Larry Wasserman,et al.  All of Statistics , 2004 .

[5]  Chen-An Tsai,et al.  Gene selection for sample classifications in microarray experiments. , 2004, DNA and cell biology.

[6]  Olivier Ledoit,et al.  Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks , 2017 .

[7]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[8]  Guofu Zhou,et al.  Markowitz meets Talmud: A combination of sophisticated and naive diversification strategies ☆ , 2011 .

[9]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[12]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  R. E. Kalman,et al.  Algebraic Aspects of the Generalized Inverse of a Rectangular Matrix , 1976 .

[15]  Gabriel Frahm,et al.  Dominating estimators for minimum-variance portfolios , 2010 .

[16]  John R. M. Hand,et al.  The supraview of return predictive signals , 2013 .

[17]  Kevin Kontos,et al.  Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction , 2009 .

[18]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[19]  Pedro Santa-Clara,et al.  Parametric Portfolio Policies: Exploiting Characteristics in the Cross Section of Equity Returns , 2004 .

[20]  R. C. Merton,et al.  On Estimating the Expected Return on the Market: An Exploratory Investigation , 1980 .

[21]  Marimuthu Palaniswami,et al.  Machine learning in low-level microarray analysis , 2003, SKDD.

[22]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[25]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .