Learning the Structure of Linear Latent Variable Models

We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is point-wise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we consider generalizations for non-linear systems.

[1]  T. Wan Structural Equation Models with Latent Variables , 2002 .

[2]  J. Wishart SAMPLING ERRORS IN THE THEORY OF TWO FACTORS , 1928 .

[3]  Clark Glymour,et al.  Social Statistics and Genuine Inquiry: Reflections on The Bell Curve , 1997 .

[4]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[5]  Christopher Meek,et al.  Quantifier Elimination for Statistical Problems , 1999, UAI.

[6]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[7]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[8]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[9]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[10]  Glenn Shafer,et al.  A Generalization of the Tetrad Representation Theorem , 2002 .

[11]  Richard Scheines,et al.  Generalized measurement models , 2005 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[14]  Richard Scheines,et al.  Learning Measurement Models for Unobserved Variables , 2002, UAI.

[15]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[16]  C. Glymour The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology , 2000 .

[17]  Fiona Steele,et al.  The Analysis and Interpretation of Multivariate Data for Social Scientists , 2002 .

[18]  Akira Harada,et al.  Stepwise variable selection in factor analysis , 2000 .

[19]  G. Bradski Graphical Models: Foundations of Neural Computation , 2003 .

[20]  D. Oppenheim Learning from Measurement , 2001 .

[21]  R. Scheines,et al.  Automatic discovery of latent variable models , 2005 .

[22]  C Loehlin John,et al.  Latent variable models: an introduction to factor, path, and structural analysis , 1986 .

[23]  Kenneth A. Bollen,et al.  Outlier Screening and a Distribution-Free Test for Vanishing Tetrads , 1990 .

[24]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[25]  Michael I. Jordan,et al.  Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..

[26]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[27]  A. Satorra Structural Equation Models with Latent Variables , 2002 .

[28]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[29]  Richard Scheines,et al.  New d-separation identification results for learning continuous latent variable models , 2005, ICML.

[30]  Richard Scheines,et al.  Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling , 1987 .