Double instrumental variable estimation of interaction models with big data

The factor analysis of a (n,m) matrix of observations Y is based on the joint spectral decomposition of the matrix squares YY′ and Y′Y for Principal Component Analysis (PCA). For very large matrix dimensions n and m, this approach has a high level of numerical complexity. The big data feature suggests new estimation methods with a smaller degree of numerical complexity. The double Instrumental Variable (IV) approach uses row and column instruments to estimate consistently the factors via an averaging method. We compare the double IV approach to PCA in terms of numerical complexity and statistical efficiency. The double IV approach can be used for the analysis of recommender systems and provides a new collaborative filtering approach.

[1]  Serena Ng,et al.  Are more data always better for factor analysis , 2006 .

[2]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[3]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  O. Klopp Noisy low-rank matrix completion with general sampling distribution , 2012, 1203.0108.

[6]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[7]  Christian Gourieroux,et al.  Granularity Theory with Applications to Finance and Insurance , 2014 .

[8]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[9]  C. Gouriéroux,et al.  Efficiency in Large Dynamic Panel Models with Common Factor , 2009 .

[10]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[11]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[12]  Michael J. Pazzani,et al.  Improvement of Collaborative Filtering with the Simple Bayesian Classifier 1 , 2002 .

[13]  J. Bai,et al.  Large Dimensional Factor Analysis , 2008 .

[14]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[15]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[16]  Qi Li,et al.  SMOOTH VARYING-COEFFICIENT ESTIMATION AND INFERENCE FOR QUALITATIVE AND QUANTITATIVE DATA , 2010, Econometric Theory.

[17]  J. Magnus,et al.  Matrix Differential Calculus with Applications , 1988 .

[18]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[19]  Christian Gourieroux,et al.  Statistics and econometric models , 1995 .

[20]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[21]  Vincent Boucher,et al.  My Friend Far Far Away: Asymptotic Properties of Pairwise Stable Networks , 2015 .

[22]  R. Kranton,et al.  A Theory of Buyer-Seller Networks , 2001 .

[23]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[24]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[25]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[26]  S. Wasserman,et al.  Stochastic a posteriori blockmodels: Construction and assessment , 1987 .

[27]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[28]  P. L. Davies Interactions in the Analysis of Variance , 2012 .

[29]  Jeffrey M. Wooldridge,et al.  Inverse probability weighted M-estimators for sample selection, attrition, and stratification , 2002 .

[30]  Alain Monfort,et al.  Bilateral Exposures and Systemic Solvency Risk , 2012 .

[31]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[32]  Serena Ng,et al.  INSTRUMENTAL VARIABLE ESTIMATION IN A DATA RICH ENVIRONMENT , 2010, Econometric Theory.

[33]  Monfort,et al.  Moindres carrés asymptotiques , 1985 .

[34]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[35]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[36]  Sven Serneels,et al.  Principal component analysis for data containing outliers and missing elements , 2008, Comput. Stat. Data Anal..

[37]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[38]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[39]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[40]  P. Bekker A note on the identification of restricted factor loading matrices , 1986 .

[41]  Clive W. J. Granger Implications of Aggregation with Common Factors , 1987 .

[42]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[43]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[44]  J. Bai,et al.  Principal components estimation and identification of static factors , 2013 .

[45]  E. Oja,et al.  Independent Component Analysis , 2013 .

[46]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[47]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[48]  A. E. Maxwell,et al.  Factor Analysis as a Statistical Method. , 1964 .

[49]  F. Palm,et al.  Asymptotic Least-Squares Estimation Efficiency Considerations and Applications , 1990 .

[50]  Chenlei Leng,et al.  Sparse Matrix Graphical Models , 2012 .

[51]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[52]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[53]  Michael J. Pazzani,et al.  Collaborative Filtering with the Simple Bayesian Classifier , 2000, PRICAI.

[54]  Lucrezia Reichlin,et al.  Dynamic common factors in large cross-sections , 1996 .

[55]  S. Geman A Limit Theorem for the Norm of Random Matrices , 1980 .

[56]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[57]  Z. Bai,et al.  On the limit of the largest eigenvalue of the large dimensional sample covariance matrix , 1988 .

[58]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[59]  Hai-Wen Chen Principal Component Analysis With Missing Data and Outliers , 2022 .

[60]  Thomas Hofmann,et al.  Collaborative filtering via gaussian probabilistic latent semantic analysis , 2003, SIGIR.

[61]  Yuichiro Kamada,et al.  Social distance and network structures , 2017 .

[62]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[63]  Christian Upper,et al.  Estimating Bilateral Exposures in the German Interbank Market: Is There a Danger of Contagion? , 2002, SSRN Electronic Journal.

[64]  J. Algina A note on identification in the oblique and orthogonal factor analysis models , 1980 .

[65]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[66]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .