Learning Latent Factors From Diversified Projections and Its Applications to Over-Estimated and Weak Factors

Abstract Estimations and applications of factor models often rely on the crucial condition that the number of latent factors is consistently estimated, which in turn also requires that factors be relatively strong, data are stationary and weakly serially dependent, and the sample size be fairly large, although in practical applications, one or several of these conditions may fail. In these cases, it is difficult to analyze the eigenvectors of the data matrix. To address this issue, we propose simple estimators of the latent factors using cross-sectional projections of the panel data, by weighted averages with predetermined weights. These weights are chosen to diversify away the idiosyncratic components, resulting in “diversified factors.” Because the projections are conducted cross-sectionally, they are robust to serial conditions, easy to analyze and work even for finite length of time series. We formally prove that this procedure is robust to over-estimating the number of factors, and illustrate it in several applications, including post-selection inference, big data forecasts, large covariance estimation, and factor specification tests. We also recommend several choices for the diversified weights. Supplementary materials for this article are available online.

[1]  Dacheng Xiu,et al.  Asset Pricing with Omitted Factors , 2019, Journal of Political Economy.

[2]  Myung Hwan Seo,et al.  Factor-driven two-regime regression , 2018, The Annals of Statistics.

[3]  Soohun Kim,et al.  Arbitrage Portfolios , 2020, The Review of Financial Studies.

[4]  Vasilis Sarafidis,et al.  A Linear Estimator for Factor-Augmented Fixed-T Panels With Endogenous Regressors , 2020, Journal of Business & Economic Statistics.

[5]  Dacheng Xiu,et al.  Thousands of Alpha Tests , 2020, The Review of Financial Studies.

[6]  Matteo Barigozzi,et al.  Consistent estimation of high-dimensional factor models when the factor number is over-estimated , 2018, Electronic Journal of Statistics.

[7]  Jianqing Fan,et al.  Factor-Adjusted Regularized Model Selection , 2016, Journal of econometrics.

[8]  D. Nicolae,et al.  Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data. , 2018, Biometrika.

[9]  Qiang Sun,et al.  FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control , 2017, Journal of the American Statistical Association.

[10]  Matthew Shum,et al.  Random Projection Estimation of Discrete-Choice Models with Large Choice Sets , 2016, Manag. Sci..

[11]  Benoit Perron,et al.  Bootstrapping factor models with cross sectional dependence , 2018 .

[12]  Joakim Westerlund,et al.  CCE estimation of factor‐augmented regression models with more factors than observables , 2018, Journal of Applied Econometrics.

[13]  Christian Hansen,et al.  THE FACTOR-LASSO AND K-STEP BOOTSTRAP APPROACH FOR INFERENCE IN HIGH-DIMENSIONAL ECONOMIC APPLICATIONS , 2016, Econometric Theory.

[14]  Yuan Liao,et al.  Inferences in panel data with interactive effects using large covariance matrices , 2017 .

[15]  Dacheng Xiu,et al.  Inference on Risk Premia in the Presence of Omitted Factors , 2017 .

[16]  J. Westerlund,et al.  On the Role of the Rank Condition in CCE Estimation of Factor-Augmented Panel Regressions , 2017 .

[17]  Qi Li,et al.  Determining the number of factors when the number of factors can increase with sample size , 2017 .

[18]  T. Hastie,et al.  CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING. , 2015, Annals of statistics.

[19]  Olivier Scaillet,et al.  Time-Varying Risk Premium in Large Cross-Sectional Equity Data Sets , 2016 .

[20]  Andrew B. Nobel,et al.  Supervised singular value decomposition and its asymptotic properties , 2016, J. Multivar. Anal..

[21]  Jianqing Fan,et al.  PROJECTED PRINCIPAL COMPONENT ANALYSIS IN FACTOR MODELS. , 2014, Annals of statistics.

[22]  Jianqing Fan,et al.  Power Enhancement in High Dimensional Cross-Sectional Tests , 2013, Econometrica : journal of the Econometric Society.

[23]  M. Weidner,et al.  Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects , 2014 .

[24]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[25]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[26]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[27]  Seung C. Ahn,et al.  Eigenvalue Ratio Test for the Number of Factors , 2013 .

[28]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[29]  O. Linton,et al.  EFFICIENT SEMIPARAMETRIC ESTIMATION OF THE FAMA-FRENCH MODEL AND EXTENSIONS , 2012 .

[30]  Kunpeng Li,et al.  STATISTICAL ANALYSIS OF FACTOR MODELS OF HIGH DIMENSION , 2012, 1205.6617.

[31]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[32]  J. Urbain,et al.  Cross-sectional averages versus principal components ☆ , 2011 .

[33]  A. Onatski Determining the Number of Factors from Empirical Distribution of Eigenvalues , 2010, The Review of Economics and Statistics.

[34]  Serena Ng,et al.  A Factor Analysis of Bond Risk Premia , 2009 .

[35]  M. Pesaran,et al.  Weak and Strong Cross-Section Dependence and Estimation of Large Panels , 2009, SSRN Electronic Journal.

[36]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[37]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[38]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[39]  Sven Ove Hansson,et al.  Measuring Uncertainty , 2009, Stud Logica.

[40]  Joseph P. Romano,et al.  Control of the false discovery rate under dependence using the bootstrap and subsampling , 2008 .

[41]  Daniel Yekutieli Comments on: Control of the false discovery rate under dependence using the bootstrap and subsampling , 2008 .

[42]  Jirí Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008, Random Struct. Algorithms.

[43]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[44]  M. Hallin,et al.  Determining the Number of Factors in the General Dynamic Factor Model , 2007 .

[45]  E. Mammen,et al.  Time Series Modelling With Semiparametric Factor Dynamics , 2007 .

[46]  Catherine Doz,et al.  A Two-Step Estimator for Large Approximate Dynamic Factor Models Based on Kalman Filtering , 2007 .

[47]  Peter J. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG , 2007 .

[48]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[49]  G. Kapetanios,et al.  Panels with Nonstationary Multifactor Error Structures , 2006, SSRN Electronic Journal.

[50]  J. Bai,et al.  Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions , 2006 .

[51]  O. Scaillet,et al.  False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas , 2005 .

[52]  Sydney C. Ludvigson,et al.  The Empirical Risk-Return Relation: A Factor Analysis Approach , 2005 .

[53]  R. Prado Time series modelling, . . . , 2005 .

[54]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[55]  M. Pesaran Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure , 2004, SSRN Electronic Journal.

[56]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[57]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[58]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[59]  Rodolfo Apreda Arbitrage Portfolios , 2002 .

[60]  Marco Lippi,et al.  The Generalized Dynamic Factor Model , 2002 .

[61]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[62]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[63]  Mark M. Carhart On Persistence in Mutual Fund Performance , 1997 .

[64]  E. Fama,et al.  The Cross‐Section of Expected Stock Returns , 1992 .

[65]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[66]  D. Andrews Asymptotic Results for Generalized Wald Tests , 1987, Econometric Theory.

[67]  Gregory Connor,et al.  Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis , 1985 .

[68]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .