Sparse Linear Identifiable Multivariate Modeling

In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/.

[1]  D. Dey,et al.  A General Class of Multivariate Skew-Elliptical Distributions , 2001 .

[2]  Calyampudi R. Rao,et al.  Characterization Problems in Mathematical Statistics , 1976 .

[3]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[4]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[5]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[6]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[7]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[8]  Arthur Gretton,et al.  Nonlinear directed acyclic structure learning with weakly additive noise models , 2009, NIPS.

[9]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[10]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[11]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[13]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[14]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[15]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[16]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[17]  K. Gaver,et al.  Posterior probabilities of alternative linear models , 1971 .

[18]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[19]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[20]  T. Berge,et al.  Generic global indentification in factor analysis , 1997 .

[21]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[22]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[23]  Rick L. Edgeman The Inverse Gaussian Distribution: Theory, Methodology, and Applications , 1989 .

[24]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[27]  J. Leslie The Inverse Gaussian Distribution: Theory, Methodology, and Applications , 1990 .

[28]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[29]  J. Geweke,et al.  Variable selection and model comparison in regression , 1994 .

[30]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..

[31]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[32]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[33]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[34]  M. West On scale mixtures of normal distributions , 1987 .

[35]  Aapo Hyvärinen,et al.  Nonlinear acyclic causal models , 2010, NIPS Causality: Objectives and Assessment.

[36]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[37]  J. Ord,et al.  Characterization Problems in Mathematical Statistics , 1975 .

[38]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[39]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[40]  Nir Friedman,et al.  Gaussian Process Networks , 2000, UAI.

[41]  Ole Winther,et al.  Bayesian Sparse Factor Models and DAGs Inference and Comparison , 2009, NIPS.

[42]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[43]  Ping Ma,et al.  Bayesian Inference for Gene Expression and Proteomics , 2007, Briefings Bioinform..

[44]  Nir Friedman,et al.  "Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks , 2007, J. Mach. Learn. Res..

[45]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[46]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[47]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[48]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[49]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[50]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[51]  Iain Murray Advances in Markov chain Monte Carlo methods , 2007 .

[52]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[53]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[54]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[55]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[56]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[57]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[58]  Michael I. Jordan,et al.  Bayesian Nonparametric Latent Feature Models , 2011 .

[59]  R. P. McDonald,et al.  Bayesian estimation in unrestricted factor analysis: A treatment for heywood cases , 1975 .

[60]  E. Oja,et al.  Independent Component Analysis , 2013 .

[61]  Joris M. Mooij,et al.  Distinguishing between cause and effect , 2008, NIPS 2008.

[62]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[63]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[64]  Carlos M. Carvalho,et al.  FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS , 2008, 0901.3267.

[65]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .