Exploratory factor analysis of large data matrices

Nowadays, the most interesting applications have data with many more variables than observations and require dimension reduction. With such data, standard exploratory factor analysis (EFA) cannot be applied. Recently, a generalized EFA (GEFA) model was proposed to deal with any type of data: both vertical data(fewer variables than observations) and horizontal data (more variables than observations). The associated algorithm, GEFALS, is very efficient, but still cannot handle data with thousands of variables. The present work modifies GEFALS and proposes a new very fast version, GEFAN. This is achieved by aligning the dimensions of the parameter matrices to their ranks, thus, avoiding redundant calculations. The GEFALS and GEFAN algorithms are compared numerically with well-known data.

[1]  R. C. Durfee,et al.  MULTIPLE FACTOR ANALYSIS. , 1967 .

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  David B. Dunson,et al.  Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds , 2010, IEEE Transactions on Signal Processing.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[6]  N. Trendafilov,et al.  Simultaneous Parameter Estimation in Exploratory Factor Analysis: An Expository Review , 2010 .

[7]  N. Trendafilov,et al.  The Orthogonally Constrained Regression Revisited , 2001 .

[8]  S. Mulaik Foundations of Factor Analysis , 1975 .

[9]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[10]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[11]  Nickolay T. Trendafilov,et al.  Zig-zag exploratory factor analysis with more variables than observations , 2013, Comput. Stat..

[12]  Nickolay T. Trendafilov,et al.  Exploratory Factor Analysis of Data Matrices With More Variables Than Observations , 2011 .

[13]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[14]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[15]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .