Switching Regression Models and Causal Inference in the Presence of Latent Variables

Given a response $Y$ and a vector $X = (X^1, \dots, X^d)$ of $d$ predictors, we investigate the problem of inferring direct causes of $Y$ among the vector $X$. Models for $Y$ that use all of its causal covariates as predictors enjoy the property of being invariant across different environments or interventional settings. Given data from such environments, this property has been exploited for causal discovery. Here, we extend this inference principle to situations in which some (discrete-valued) direct causes of $ Y $ are unobserved. Such cases naturally give rise to switching regression models. We provide sufficient conditions for the existence, consistency and asymptotic normality of the MLE in linear switching regression models with Gaussian noise, and construct a test for the equality of such models. These results allow us to prove that the proposed causal discovery method obtains asymptotic false discovery control under mild conditions. We provide an algorithm, make available code, and test our method on simulated data. It is robust against model violations and outperforms state-of-the-art approaches. We further apply our method to a real data set, where we show that it does not only output causal predictors, but also a process-based clustering of data points, which could be of additional interest to practitioners.

[1]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[2]  T. Turner,et al.  Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions , 2000 .

[3]  Limin Yang,et al.  Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data , 2000 .

[4]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[5]  Bernhard Schölkopf,et al.  Identifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders , 2013, UAI.

[6]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[7]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[8]  Jens Ledet Jensen,et al.  Asymptotic normality of the maximum likelihood estimator in state space models , 1999 .

[9]  James M. Robins,et al.  Nested Markov Properties for Acyclic Directed Mixed Graphs , 2012, UAI.

[10]  Philip Lewis,et al.  Retrieval and global assessment of terrestrial chlorophyll fluorescence from GOSAT space measurements , 2012 .

[11]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[12]  T. Heskes,et al.  Learning Sparse Causal Models is not NP-hard , 2013, UAI.

[13]  B. Leroux Maximum-likelihood estimation for hidden Markov models , 1992 .

[14]  Tak Kuen Siu,et al.  Markov Chains: Models, Algorithms and Applications , 2006 .

[15]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[16]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[17]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[18]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[19]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[20]  Richard E. Quandt,et al.  The Estimation of Structural Shifts by Switching Regressions , 1973 .

[21]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[22]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[23]  Joris M. Mooij,et al.  Joint Causal Inference from Multiple Contexts , 2016, J. Mach. Learn. Res..

[24]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[25]  Shri Kant Machine Learning and Pattern Recognition , 2010 .

[26]  Ricardo Silva,et al.  Causal Inference through a Witness Protection Program , 2014, J. Mach. Learn. Res..

[27]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[28]  Magnus Sander,et al.  Market timing over the business cycle , 2018 .

[29]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[30]  Giorgos Borboudakis,et al.  Constraint-based causal discovery with mixed data , 2018, International Journal of Data Science and Analytics.

[31]  N. Kiefer Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model , 1978 .

[32]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[33]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[34]  Kenneth A. Bollen,et al.  Structural Equations with Latent Variables , 1989 .

[35]  Bobby Schnabel,et al.  A modular system of algorithms for unconstrained minimization , 1985, TOMS.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Peter Spirtes,et al.  A Hybrid Causal Search Algorithm for Latent Variable Models , 2016, Probabilistic Graphical Models.

[38]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[39]  W. Zucchini,et al.  Hidden Markov Models for Time Series: An Introduction Using R , 2009 .

[40]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[41]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[42]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[43]  J. Mooij,et al.  Joint Causal Inference on Observational and Experimental Datasets , 2016, ArXiv.

[44]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[45]  Bernhard Schölkopf,et al.  Causal Markov Condition for Submodular Information Measures , 2010, COLT.

[46]  L. Guanter,et al.  Consistency Between Sun-Induced Chlorophyll Fluorescence and Gross Primary Production of Vegetation in North America , 2016 .

[47]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[48]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[49]  P. Bickel,et al.  Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models , 1998 .

[50]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[51]  Steven W. Running,et al.  User's Guide Daily GPP and Annual NPP (MOD17A2/A3) Products NASA Earth Observing System MODIS Land Algorithm , 2015 .

[52]  Roland Langrock,et al.  Markov-switching generalized additive models , 2014, Stat. Comput..

[53]  Mikko Koivisto,et al.  Advances in Exact Bayesian Structure Discovery in Bayesian Networks , 2006, UAI.

[54]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[55]  A. Bondeau,et al.  Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model , 2009 .