The Five Trolls Under the Bridge: Principal Component Analysis With Asynchronous and Noisy High Frequency Data

We develop a principal component analysis (PCA) for high frequency data. As in Northern fairly tales, there are trolls waiting for the explorer. The first three trolls are market microstructure noise, asynchronous sampling times, and edge effects in estimators. To get around these, a robust estimator of spot covariance matrix is developed based on the Smoothed TSRV (Mykland et al. (2017)). The fourth troll is how to pass from estimated time-varying covariance matrix to PCA. Under finite dimensionality, we develop this methodology through the estimation of realized spectral functions. Rates of convergence and central limit theory, as well as an estimator of standard error, are established. The fifth troll is high dimension on top of high frequency, where we also develop PCA. With the help of a new identity concerning the spot principal orthogonal complement, the high-dimensional rates of convergence have been studied by freeing several strong assumptions in classical PCA. As an application, we show that our first principal component (PC) potentially outperforms the S&P 100 market index, while three of the next four PCs are cointegrated with two of the Fama-French non-market factors.

[1]  P. Mykland,et al.  The Algebra of Two Scales Estimation, and the S-TSRV: High Frequency Estimation That Is Robust to Sampling Times , 2018, Journal of Econometrics.

[2]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[3]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[4]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[5]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[6]  Gregory Connor,et al.  Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis , 1985 .

[7]  W. Sharpe CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[8]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[9]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[10]  Markus Pelger Large-Dimensional Factor Modeling Based on High-Frequency Observations , 2018 .

[11]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[12]  E. Fama,et al.  International Tests of a Five-Factor Asset Pricing Model , 2015 .

[13]  Jianqing Fan,et al.  Incorporating Global Industrial Classification Standard into Portfolio Allocation: A Simple Factor-Based Large Covariance Matrix Estimator with High Frequency Data , 2015 .

[14]  Enrique Sentana,et al.  The Econometrics of Mean-Variance Efficiency Tests: A Survey , 2009 .

[15]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[16]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[17]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18]  Dacheng Xiu,et al.  Principal Component Analysis of High-Frequency Data , 2015, Journal of the American Statistical Association.

[19]  Xinbing Kong On the number of common factors with high‐frequency data , 2017 .

[20]  F. Black Capital Market Equilibrium with Restricted Borrowing , 1972 .

[21]  J. Stock,et al.  Diffusion Indexes , 1998 .

[22]  A. Lo,et al.  THE ECONOMETRICS OF FINANCIAL MARKETS , 1996, Macroeconomic Dynamics.

[23]  Kun Lu,et al.  Knowing Factors or Factor Loadings, or Neither? Evaluating Estimators of Large Covariance Matrices with Noisy and Asynchronous Data , 2017, Journal of Econometrics.

[24]  S. Friedland Convex spectral functions , 1981 .

[25]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[26]  Ke Yu,et al.  Constraints , 2019, Sexual Selection.

[27]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[28]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[29]  Dacheng Xiu,et al.  Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data , 2016 .

[30]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[31]  M. Avellaneda,et al.  Statistical arbitrage in the US equities market , 2010 .

[32]  Sheridan Titman,et al.  On Persistence in Mutual Fund Performance , 1997 .

[33]  J. A. Rodríguez,et al.  Linear and Multilinear Algebra , 2007 .

[34]  S. Ross The arbitrage theory of capital asset pricing , 1976 .

[35]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[36]  Lan Zhang,et al.  Assessment of Uncertainty in High Frequency Data: The Observed Asymptotic Variance , 2016 .

[37]  E. Verriest,et al.  On analyticity of functions involving eigenvalues , 1994 .

[38]  G. Hunanyan,et al.  Portfolio Selection , 2019, Finanzwirtschaft, Banken und Bankmanagement I Finance, Banks and Bank Management.

[39]  Jianqing Fan,et al.  An Overview of the Estimation of Large Covariance and Precision Matrices , 2015, The Econometrics Journal.

[40]  J. Lintner THE VALUATION OF RISK ASSETS AND THE SELECTION OF RISKY INVESTMENTS IN STOCK PORTFOLIOS AND CAPITAL BUDGETS , 1965 .

[41]  Jianqing Fan,et al.  High Dimensional Covariance Matrix Estimation in Approximate Factor Models , 2011, Annals of statistics.

[42]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[43]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[44]  P. Mykland,et al.  ANOVA for diffusions and Itô processes , 2006, math/0611274.

[45]  W. Sharpe,et al.  Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk , 2007 .

[46]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[47]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[48]  Markus Pelger Understanding Systematic Risk: A High-Frequency Approach , 2019 .

[49]  Zhou Zhou,et al.  “A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data” , 2005 .