Latent variable selection in structural equation models

Structural equation models (SEMs) are often formulated using a prespecified parametric structural equation. In many applications, however, the formulation of the structural equation is unknown, and its misspecification may lead to unreliable statistical inference. This paper develops a general SEM in which latent variables are linearly regressed on themselves, thereby avoiding the need to specify outcome/explanatory latent variables. A penalized likelihood method with a proper penalty function is proposed to simultaneously select latent variables and estimate the coefficient matrix in formulating the structural equation. Under some regularity conditions, we show the consistency and the oracle property of the proposed estimators. We also develop an expectation/conditional maximization (ECM) algorithm involving a minorization–maximization algorithm that facilitates the second M-step. Simulation studies are performed and a real data set is analyzed to illustrate the proposed methods.

[1]  Xin-Yuan Song,et al.  Model comparison of nonlinear structural equation models with fixed covariates , 2003 .

[2]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[3]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[4]  K. Jöreskog A General Method for Estimating a Linear Structural Equation System. , 1970 .

[5]  Kenneth A. Bollen,et al.  STRUCTURAL EQUATION MODELS THAT ARE NONLINEAR IN LATENT VARIABLES: A LEAST- SQUARES ESTIMATOR , 1995 .

[6]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[7]  Sik-Yum Lee,et al.  Maximum likelihood estimation of nonlinear structural equation models , 2002 .

[8]  Hongtu Zhu,et al.  VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA. , 2010, Statistica Sinica.

[9]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[10]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Daniel J. Bauer A Semiparametric Approach to Modeling Nonlinear Relations Among Latent Variables , 2005 .

[13]  Sik-Yum Lee,et al.  Analysis of structural equation model with ignorable missing continuous and polytomous data , 2002 .

[14]  Xin-Yuan Song,et al.  A Bayesian Approach for Analyzing Longitudinal Structural Equation Models , 2011 .

[15]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[16]  Adrian E. Raftery,et al.  Bayesian Model Selection in Structural Equation Models , 1992 .

[17]  D. Freedman,et al.  On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors” , 2006 .

[18]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[19]  Nian-Sheng Tang,et al.  Bayesian analysis of structural equation models with mixed exponential family and ordered categorical data. , 2006, The British journal of mathematical and statistical psychology.

[20]  Edward H. Ip,et al.  A Bayesian Modeling Approach for Generalized Semiparametric Structural Equation Models , 2013, Psychometrika.

[21]  H. Bondell,et al.  Joint Variable Selection for Fixed and Random Effects in Linear Mixed‐Effects Models , 2010, Biometrics.

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[25]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[26]  Sik-Yum Lee,et al.  Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences , 2012 .