The Temporal Logic of Causal Structures

Computational analysis of time-course data with an underlying causal structure is needed in a variety of domains, including neural spike trains, stock price movements, and gene expression levels. However, it can be challenging to determine from just the numerical time course data alone what is coordinating the visible processes, to separate the underlying prima facie causes into genuine and spurious causes and to do so with a feasible computational complexity. For this purpose, we have been developing a novel algorithm based on a framework that combines notions of causality in philosophy with algorithmic approaches built on model checking and statistical techniques for multiple hypotheses testing. The causal relationships are described in terms of temporal logic formulae, reframing the inference problem in terms of model checking. The logic used, PCTL, allows description of both the time between cause and effect and the probability of this relationship being observed. We show that equipped with these causal formulae with their associated probabilities we may compute the average impact a cause makes to its effect and then discover statistically significant causes through the concepts of multiple hypothesis testing (treating each causal relationship as a hypothesis), and false discovery control. By exploring a well-chosen family of potentially all significant hypotheses with reasonably minimal description length, it is possible to tame the algorithm's computational complexity while exploring the nearly complete search-space of all prima facie causes. We have tested these ideas in a number of domains and illustrate them here with two examples.

[1]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[2]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[3]  Sumit Kumar Jha,et al.  Temporal-logics as query languages for dynamic Bayesian networks: application to D. melanogaster embryo development , 2006 .

[4]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[5]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[6]  Christel Baier,et al.  Symbolic Model Checking for Probabilistic Processes , 1997, ICALP.

[7]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[8]  Christopher J. Langmead,et al.  Towards Inference and Learning in Dynamic Bayesian Networks using Generalized Evidence , 2008 .

[9]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[10]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[13]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[14]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[15]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  O. Penrose The Direction of Time , 1962 .

[18]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[19]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..