Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables

Given a response $Y$ and a vector $X = (X^1, \dots, X^d)$ of $d$ predictors, we investigate the problem of inferring direct causes of $Y$ among the vector $X$. Models for $Y$ that use all of its causal covariates as predictors enjoy the property of being invariant across different environments or interventional settings. Given data from such environments, this property has been exploited for causal discovery. Here, we extend this inference principle to situations in which some (discrete-valued) direct causes of $ Y $ are unobserved. Such cases naturally give rise to switching regression models. We provide sufficient conditions for the existence, consistency and asymptotic normality of the MLE in linear switching regression models with Gaussian noise, and construct a test for the equality of such models. These results allow us to prove that the proposed causal discovery method obtains asymptotic false discovery control under mild conditions. We provide an algorithm, make available code, and test our method on simulated data. It is robust against model violations and outperforms state-of-the-art approaches. We further apply our method to a real data set, where we show that it does not only output causal predictors, but also a process-based clustering of data points, which could be of additional interest to practitioners.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[3]  Richard E. Quandt,et al.  The Estimation of Structural Shifts by Switching Regressions , 1973 .

[4]  N. Kiefer Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model , 1978 .

[5]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[6]  Bobby Schnabel,et al.  A modular system of algorithms for unconstrained minimization , 1985, TOMS.

[7]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[8]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[9]  B. Leroux Maximum-likelihood estimation for hidden Markov models , 1992 .

[10]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  P. Bickel,et al.  Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models , 1998 .

[13]  Jens Ledet Jensen,et al.  Asymptotic normality of the maximum likelihood estimator in state space models , 1999 .

[14]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[15]  T. Turner,et al.  Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions , 2000 .

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  Limin Yang,et al.  Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data , 2000 .

[18]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[19]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[20]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[21]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[22]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[23]  Mikko Koivisto,et al.  Advances in Exact Bayesian Structure Discovery in Bayesian Networks , 2006, UAI.

[24]  Tak Kuen Siu,et al.  Markov Chains: Models, Algorithms and Applications , 2006 .

[25]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[26]  A. Bondeau,et al.  Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model , 2009 .

[27]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[28]  W. Zucchini,et al.  Hidden Markov Models for Time Series: An Introduction Using R , 2009 .

[29]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[30]  Shri Kant Machine Learning and Pattern Recognition , 2010 .

[31]  Bernhard Schölkopf,et al.  Causal Markov Condition for Submodular Information Measures , 2010, COLT.

[32]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[33]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[34]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[35]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[36]  Philip Lewis,et al.  Retrieval and global assessment of terrestrial chlorophyll fluorescence from GOSAT space measurements , 2012 .

[37]  James M. Robins,et al.  Nested Markov Properties for Acyclic Directed Mixed Graphs , 2012, UAI.

[38]  Bernhard Schölkopf,et al.  Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders , 2013, UAI.

[39]  T. Heskes,et al.  Learning Sparse Causal Models is not NP-hard , 2013, UAI.

[40]  Ricardo Silva,et al.  Causal Inference through a Witness Protection Program , 2014, J. Mach. Learn. Res..

[41]  Extended Conditional Independence and Applications in Causal Inference , 2015 .

[42]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[43]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[44]  Steven W. Running,et al.  User's Guide Daily GPP and Annual NPP (MOD17A2/A3) Products NASA Earth Observing System MODIS Land Algorithm , 2015 .

[45]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[46]  L. Guanter,et al.  Consistency Between Sun-Induced Chlorophyll Fluorescence and Gross Primary Production of Vegetation in North America , 2016 .

[47]  Peter Spirtes,et al.  A Hybrid Causal Search Algorithm for Latent Variable Models , 2016, Probabilistic Graphical Models.

[48]  J. Mooij,et al.  Joint Causal Inference on Observational and Experimental Datasets , 2016, ArXiv.

[49]  Roland Langrock,et al.  Markov-switching generalized additive models , 2014, Stat. Comput..

[50]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[51]  J. Peters,et al.  Identifying Causal Structure in Large-Scale Kinetic Systems , 2018, ArXiv.

[52]  Magnus Sander,et al.  Market timing over the business cycle , 2018 .

[53]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[54]  Giorgos Borboudakis,et al.  Constraint-based causal discovery with mixed data , 2018, International Journal of Data Science and Analytics.

[55]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[56]  Joris M. Mooij,et al.  Joint Causal Inference from Multiple Contexts , 2016, J. Mach. Learn. Res..

[57]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).