Sparsest factor analysis for clustering variables: a matrix decomposition approach

We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.

[1]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[2]  Nickolay T. Trendafilov,et al.  From simple structure to sparse components: a review , 2014, Comput. Stat..

[3]  Stephanie Thalberg,et al.  Applied Factor Analysis In The Natural Sciences , 2016 .

[4]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[5]  Gerhard Tutz,et al.  Additive mixed models with approximate Dirichlet process mixtures: the EM approach , 2016, Stat. Comput..

[6]  Nickolay T. Trendafilov,et al.  Exploratory Factor Analysis of Data Matrices With More Variables Than Observations , 2011 .

[7]  Gregor Sočan,et al.  The incremental value of minimum rank factor analysis , 2003 .

[8]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[9]  Maurizio Vichi,et al.  Clustering and disjoint principal component analysis , 2009, Comput. Stat. Data Anal..

[10]  Kei Hirose,et al.  Sparse estimation via nonconcave penalized likelihood in factor analysis model , 2012, Stat. Comput..

[11]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[12]  Daniela Calvetti,et al.  Matrix methods in data mining and pattern recognition , 2009, Math. Comput..

[13]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[14]  Kei Hirose,et al.  Estimation of an oblique structure via penalized likelihood factor analysis , 2013, Comput. Stat. Data Anal..

[15]  George A. F. Seber,et al.  A matrix handbook for statisticians , 2007 .

[16]  Magnus Rattray,et al.  Inference algorithms and learning theory for Bayesian sparse factor analysis , 2009 .

[17]  M. Timmerman RECENT DEVELOPMENTS ON STRUCTURAL EQUATION MODELS: THEORY AND APPLICATIONS , 2004 .

[18]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[19]  K. Holzinger,et al.  A study in factor analysis : the stability of a bi-factor solution , 1939 .

[20]  Kohei Adachi SOME CONTRIBUTIONS TO DATA-FITTING FACTOR ANALYSIS WITH EMPIRICAL COMPARISONS TO COVARIANCE-FITTING FACTOR ANALYSIS , 2012 .

[21]  Alwin Stegeman,et al.  A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts , 2016, Comput. Stat. Data Anal..

[22]  Jos M. F. ten Berge,et al.  A generalization of Kristof's theorem on the trace of certain matrix products , 1983 .

[23]  Brian Everitt,et al.  Cluster analysis , 1974 .

[24]  S. Mulaik Foundations of Factor Analysis , 1975 .

[26]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[27]  H. Harman Modern factor analysis , 1961 .

[28]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[29]  L. R. Goldberg THE DEVELOPMENT OF MARKERS FOR THE BIG-FIVE FACTOR STRUCTURE , 1992 .

[30]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[31]  T. Berge Least squares optimization in multivariate analysis , 2005 .

[32]  N. Trendafilov,et al.  Simultaneous Parameter Estimation in Exploratory Factor Analysis: An Expository Review , 2010 .

[33]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[34]  Nickolay T. Trendafilov,et al.  Sparse Orthogonal Factor Analysis , 2014 .

[35]  Nickolay T. Trendafilov,et al.  Exploratory factor and principal component analyses: some new aspects , 2013, Stat. Comput..

[36]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .