Sparse Extended Redundancy Analysis: Variable Selection via the Exclusive LASSO

Extended Redundancy Analysis is a statistical tool for exploring the directional relationships of multiple sets of exogenous variables on a set of endogenous variables. This approach posits that the endogenous and exogenous variables are related via latent components, each of which is extracted from a set of exogenous variables, that account for the maximum variation of the endogenous variables. However, it is often difficult to distinguish between the true variables that form the latent components and the false variables that do not, especially when the association between the true variables and the exogenous set is weak. To overcome this limitation, we propose a Sparse Extended Redundancy Analysis via the Exclusive LASSO that performs variable selection while maintaining model specification. We validate the performance of the proposed approach in a simulation study. Finally, the empirical utility of this approach is demonstrated through two examples-one on a study of youth academic achievement and the other on a text analysis of newspaper data.

[1]  Kenneth A. Bollen,et al.  Latent Variable Models Under Misspecification: Two-Stage Least Squares (2SLS) and Maximum Likelihood (ML) Estimators , 2007 .

[2]  Ryan E Wiegand,et al.  Performance of using multiple stepwise algorithms for variable selection , 2010, Statistics in medicine.

[3]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[4]  Ji Yeh Choi,et al.  Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments , 2018, Psychometrika.

[5]  Adamantios Diamantopoulos,et al.  In defense of causal-formative indicators: A minority report. , 2017, Psychological methods.

[6]  J. Berge,et al.  Tucker's congruence coefficient as a meaningful index of factor similarity. , 2006 .

[7]  Ian T. Jolliffe,et al.  Variable selection and interpretation in correlation principal components , 2005 .

[8]  Henk A. L. Kiers,et al.  Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices in two or more populations , 1989 .

[9]  Sungkyoung Choi,et al.  Pathway-based approach using hierarchical components of rare variants to analyze multiple phenotypes , 2018, BMC Bioinformatics.

[10]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[11]  J J McArdle,et al.  Principles versus Principals of Structural Factor Analyses. , 1990, Multivariate behavioral research.

[12]  D. Cox A note on data-splitting for the evaluation of significance levels , 1975 .

[13]  M. Bradley,et al.  Memory, emotion, and pupil diameter: Repetition of natural scenes. , 2015, Psychophysiology.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  R. Boselli,et al.  Simulation studies of structural equation models with covariates in a redundancy analysis framework , 2015 .

[16]  Pui-Wa Lei,et al.  Evaluating estimation methods for ordinal data in structural equation modeling , 2009 .

[17]  John J. McArdle,et al.  Regularized Structural Equation Modeling , 2015, Multivariate behavioral research.

[18]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[19]  Heungsun Hwang,et al.  An extended redundancy analysis and its applications to two practical examples , 2005, Comput. Stat. Data Anal..

[20]  P. Bentler,et al.  Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives , 1999 .

[21]  Jang-Han Lee,et al.  Generalized Functional Extended Redundancy Analysis , 2015, Psychometrika.

[22]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[23]  Jang-Han Lee,et al.  Functional Extended Redundancy Analysis , 2012, Psychometrika.

[24]  J. Peterson,et al.  Marital Disruption, Parent-Child Relationships, and Behavior Problems in Children. , 1986 .

[25]  K. Widaman On Common Factor and Principal Component Representations of Data: Implications for Theory and for Confirmatory Replications , 2018, Structural Equation Modeling: A Multidisciplinary Journal.

[26]  Ian T. Jolliffe,et al.  Variable selection and the interpretation of principal subspaces , 2001 .

[27]  Nick Lee,et al.  Problems with formative and higher-order reflective variables , 2013 .

[28]  R. MacCallum,et al.  The use of causal indicators in covariance structure models: some practical issues. , 1993, Psychological bulletin.

[29]  Jamil Zaki,et al.  Tracking the Emotional Highs but Missing the Lows: Hypomania Risk is Associated With Positively Biased Empathic Inference , 2015, Cognitive Therapy and Research.

[30]  Genevera I. Allen,et al.  Within Group Variable Selection through the Exclusive Lasso , 2015, 1505.07517.

[31]  L. J. Williams,et al.  Recent Advances in Causal Modeling Methods for Organizational and Management Research , 2003 .

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[34]  O. Morozova,et al.  Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia , 2015, BMC Medical Research Methodology.

[35]  Andrew M. Hardin,et al.  A Commentary on the Use of Formative Measurement , 2011 .

[36]  Kenneth A. Bollen,et al.  Evaluating Effect, Composite, and Causal Indicators in Structural Equation Models , 2011, MIS Q..