A memory-based method to select the number of relevant components in principal component analysis

We propose a new data-driven method to select the optimal number of relevant components in Principal Component Analysis (PCA). This new method applies to correlation matrices whose time autocorrelation function decays more slowly than an exponential, giving rise to long memory effects. In comparison with other available methods present in the literature, our procedure does not rely on subjective evaluations and is computationally inexpensive. The underlying basic idea is to use a suitable factor model to analyse the residual memory after sequentially removing more and more components, and stopping the process when the maximum amount of memory has been accounted for by the retained components. We validate our methodology on both synthetic and real financial data, and find in all cases a clear and computationally superior answer entirely compatible with available heuristic criteria, such as cumulative variance and cross-validation.

[1]  Adrienne L. Fairhall,et al.  Dimensionality reduction in neuroscience , 2016, Current Biology.

[2]  Tomaso Aste,et al.  Risk diversification: a study of persistence with a filtered correlation-network approach , 2014, 1410.5621.

[3]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[4]  Jeffrey D. Evanseck,et al.  Chapter 13 Principal Components Analysis: A Review of its Application on Molecular Dynamics Data , 2006 .

[5]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Z. Burda,et al.  Spectral moments of correlated Wishart matrices. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  T. Di Matteo,et al.  Complex networks on hyperbolic surfaces , 2004, cond-mat/0408443.

[8]  Tomaso Aste,et al.  Correction: Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods , 2014, PloS one.

[9]  Yuichi Ikeda,et al.  Complex correlation approach for high frequency financial data , 2017, 1706.06355.

[10]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[11]  J. Bouchaud,et al.  Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management , 2011 .

[12]  T. Aste,et al.  Exploring complex networks via topological embedding on surfaces. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  S Miccichè,et al.  Empirical relationship between stocks’ cross-correlation and stocks’ volatility clustering , 2013 .

[14]  W. Sharpe CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[15]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  J. Bouchaud,et al.  Financial Applications of Random Matrix Theory: a short review , 2009, 0910.1205.

[17]  Christian Franzke,et al.  Nonlinear Trends, Long-Range Dependence, and Climate Noise Properties of Surface Temperature , 2012 .

[18]  V. Kobelev,et al.  Fractional Langevin Equation to Describe Anomalous Diffusion , 2000 .

[19]  H. Yau,et al.  On the principal components of sample covariance matrices , 2014, 1404.0788.

[20]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[21]  Ajay Singh,et al.  Random matrix application to correlations amongst the volatility of assets , 2016 .

[22]  Francis X. Diebold,et al.  Modeling and Forecasting Realized Volatility , 2001 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  P. Clark A Subordinated Stochastic Process Model with Finite Variance for Speculative Prices , 1973 .

[25]  T. Di Matteo,et al.  A cluster driven log-volatility factor model: a deepening on the source of the volatility clustering , 2017, Quantitative Finance.

[26]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[27]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[28]  C. Tracy,et al.  Introduction to Random Matrices , 1992, hep-th/9210073.

[29]  Alex Weissensteiner,et al.  The CHF/EUR Exchange Rate during the Swiss National Bank’s Minimum Exchange Rate Policy: A Latent Likelihood Approach , 2017 .

[30]  Matteo Marsili,et al.  Emergence of time-horizon invariant correlation structure in financial returns by subtraction of the market mode. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  V. Plerou,et al.  Random matrix approach to cross correlations in financial data. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  M. Bartlett TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS , 1950 .

[33]  Giancarlo Diana,et al.  Cross-validation methods in principal component analysis: A comparison , 2002 .

[34]  H. Theil,et al.  Economic Forecasts and Policy. , 1959 .

[35]  Carol Alexander Principal component models for generating large GARCH covariance matrices , 2002 .

[36]  Thomas Guhr,et al.  Power mapping with dynamical adjustment for improved portfolio optimization , 2010 .

[37]  G. Biroli,et al.  The Student ensemble of correlation matrices: eigenvalue spectrum and Kullback-Leibler entropy , 2007, 0710.0802.

[38]  Jean-Philippe Bouchaud,et al.  Cleaning large correlation matrices: tools from random matrix theory , 2016, 1610.08104.

[39]  B. Mandlebrot The Variation of Certain Speculative Prices , 1963 .

[40]  Sheng-Tsaing Tseng,et al.  A decision procedure for determining the number of components in principal component analysis , 1992 .

[41]  Asghar Ali,et al.  Principal Component Analysis Applied to Some Data from Fruit Nutrition Experiments , 1985 .

[42]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[43]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[44]  M Tumminello,et al.  A tool for filtering information in complex systems. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J. Regier Second-order stochastic variational inference , 2017 .

[46]  Tomaso Aste,et al.  Interplay between past market correlation structure changes and future volatility outbursts , 2016, Scientific Reports.

[47]  J. Bouchaud,et al.  Overlaps between eigenvectors of correlated random matrices , 2016, Physical Review E.

[48]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[49]  G. Livan,et al.  Fine structure of spectral properties for random correlation matrices: an application to financial markets. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[51]  P. Vivo,et al.  Superstatistical generalizations of Wishart–Laguerre ensembles of random matrices , 2008, 0811.1992.

[52]  Heidelberg,et al.  A New Method to Estimate the Noise in Financial Correlation Matrices , 2002, cond-mat/0206577.

[53]  Stephen L Taylor,et al.  MODELING STOCHASTIC VOLATILITY: A REVIEW AND COMPARATIVE STUDY , 1994 .

[54]  H. Akaike A new look at the statistical model identification , 1974 .

[55]  J. Hull Options, Futures, and Other Derivatives , 1989 .

[56]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[57]  Wolfram Barfuss,et al.  Parsimonious modeling with Information Filtering Networks , 2016, Physical review. E.

[58]  Alan G. White,et al.  The Pricing of Options on Assets with Stochastic Volatilities , 1987 .

[59]  J. Bouchaud,et al.  Noise Dressing of Financial Correlation Matrices , 1998, cond-mat/9810255.

[60]  Erkki Oja,et al.  Independent Component Analysis Aapo Hyvärinen, Juha Karhunen, , 2004 .

[61]  K. Linkenkaer-Hansen,et al.  Long-Range Temporal Correlations and Scaling Behavior in Human Brain Oscillations , 2001, The Journal of Neuroscience.

[62]  T. Aste,et al.  The use of dynamical networks to detect the hierarchical organization of financial market sectors , 2010 .

[63]  Pierpaolo Vivo,et al.  Introduction to Random Matrices: Theory and Practice , 2017, 1712.07903.

[64]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[65]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[66]  F. Breidt,et al.  The detection and estimation of long memory in stochastic volatility , 1998 .

[67]  R. C. Merton,et al.  AN INTERTEMPORAL CAPITAL ASSET PRICING MODEL , 1973 .

[68]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[69]  L. Bauwens,et al.  Multivariate GARCH Models: A Survey , 2003 .

[70]  S. Majumdar,et al.  Number of relevant directions in principal component analysis and Wishart random matrices. , 2011, Physical review letters.

[71]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[72]  H. Tong,et al.  On a statistic useful in dimensionality reduction in multivariable linear stochastic system , 1976 .

[73]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[74]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[75]  C. R. Dietrich,et al.  Fast and Exact Simulation of Stationary Gaussian Processes through Circulant Embedding of the Covariance Matrix , 1997, SIAM J. Sci. Comput..