Principal Components and Regularized Estimation of Factor Models

It is known that the common factors in a large panel of data can be consistently estimated by the method of principal components, and principal components can be constructed by iterative least squares regressions. Replacing least squares with ridge regressions turns out to have the effect of shrinking the singular values of the common component and possibly reducing its rank. The method is used in the machine learning literature to recover low-rank matrices. We study the procedure from the perspective of estimating a minimum-rank approximate factor model. We show that the constrained factor estimates are biased but can be more efficient in terms of mean-squared errors. Rank consideration suggests a data-dependent penalty for selecting the number of factors. The new criterion is more conservative in cases when the nominal number of factors is inflated by the presence of weak factors or large measurement noise. The framework is extended to incorporate a priori linear constraints on the loadings. We provide asymptotic results that can be used to test economic hypotheses.

[1]  J. Berge,et al.  A numerical approach to the approximate and the exact minimum rank of a covariance matrix , 1991 .

[2]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[3]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[4]  Markus Pelger,et al.  Estimating Latent Asset-Pricing Factors , 2018, Journal of Econometrics.

[5]  Gregory Connor,et al.  Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis , 1985 .

[6]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[7]  K. Jöreskog Some contributions to maximum likelihood factor analysis , 1967 .

[8]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[9]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[10]  Louis Guttman,et al.  To what extent can communalities reduce rank? , 1958 .

[11]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[12]  Andreas Buja,et al.  A Sparse Singular Value Decomposition Method for High-Dimensional Data , 2014 .

[13]  Dimitris Bertsimas,et al.  Certifiably Optimal Low Rank Factor Analysis , 2016, J. Mach. Learn. Res..

[14]  Stephen P. Boyd,et al.  The CVX Users' Guide , 2015 .

[15]  Serena Ng,et al.  Are More Data Always Better for Factor Analysis? , 2003 .

[16]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[17]  J. Bai,et al.  Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions , 2006 .

[18]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[19]  J. Bai,et al.  Principal components estimation and identification of static factors , 2013 .

[20]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[21]  M. Genton,et al.  Highly Robust Estimation of Dispersion Matrices , 2001 .

[22]  J. Stock,et al.  Macroeconomic Forecasting Using Diffusion Indexes , 2002 .

[23]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[24]  J. A. Woodward,et al.  Inequalities among lower bounds to reliability: With applications to test construction and factor analysis , 1980 .

[25]  Serena Ng,et al.  Level and Volatility Factors in Macroeconomic Data , 2017 .

[26]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[27]  Serena Ng,et al.  Working Paper Series , 2019 .

[28]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[29]  A. Shapiro Rank-reducibility of a symmetric matrix and sampling theory of minimum trace factor analysis , 1982 .

[30]  Peng Wang,et al.  Identification theory for high dimensional static and dynamic factor models , 2014 .

[31]  J. Bai,et al.  Large Dimensional Factor Analysis , 2008 .

[32]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[33]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[34]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[35]  A. Basilevsky,et al.  Factor Analysis as a Statistical Method. , 1964 .

[36]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[37]  Pablo A. Parrilo,et al.  Diagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting , 2012, SIAM J. Matrix Anal. Appl..

[38]  Xiaodong Li,et al.  Stable Principal Component Pursuit , 2010, 2010 IEEE International Symposium on Information Theory.

[39]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[40]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[41]  Mark W. Watson,et al.  Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics , 2016 .

[42]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[43]  P. Bentler A lower-bound method for the dimension-free measurement of internal consistency , 1972 .

[44]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1983 .

[45]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[46]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[47]  Alexander Shapiro,et al.  The asymptotic bias of minimum trace factor analysis, with applications to the greatest lower bound to reliability , 2000 .

[48]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[49]  M. Hallin,et al.  The Generalized Dynamic-Factor Model: Identification and Estimation , 2000, Review of Economics and Statistics.

[50]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[51]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[52]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[53]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..