Revisiting Causality Inference In Markov Chain

Identifying causal relationships is a key premise of scientific research. Given the mass of observational data in many disciplines, new machine learning methods offer the possibility of using an empirical approach to identifying unappreciated causal relationships and to understanding causal behavior. Conventional methods of causality inference from observational data require a considerable length of time series data to capture cause and effect relationships. We believe that important causal relationships can be inferred from the composition of one-step transition rates (Markov Chains) to and from an event. Here we introduce 'Causality Inference using Composition of Transitions' (CICT), a computationally efficient method that reveals causal structure with high accuracy. We characterize the differences in causes, effects, and random events in the composition of their inputs and outputs. To demonstrate our method, we have used an administrative inpatient healthcare dataset to set up a graph network of patients transition between different diagnoses. Then we apply our method to patients transition graph, revealing deep and complex causal structure between clinical conditions. Our method is highly accurate in predicting whether a transition in a Markov chain is causal or random and performs well in identifying the direction of causality in bidirectional associations. Moreover, CICT brings in new information that enables unsupervised clustering methods to discriminate causality from randomness. Comprehensive performance analysis using C-statistics, goodness-of-fit statistics and decision analysis of predictive models, as well as comparison with the medical ground truth, validates our findings.

[1]  J. Hosking L‐Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics , 1990 .

[2]  R. Burke,et al.  Detecting dynamical interdependence and generalized synchrony through mutual prediction in a neural ensemble. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  J. Cerhan,et al.  Heart Failure After Myocardial Infarction Is Associated With Increased Risk of Cancer. , 2014, Journal of the American College of Cardiology.

[4]  Samuel Kotz,et al.  Log-Skew-Normal and Log-Skew-t Distributions as Models for Family Income Data , 2002, Journal of Income Distribution®.

[5]  B. Reiser,et al.  Estimation of the Youden Index and its Associated Cutoff Point , 2005, Biometrical journal. Biometrische Zeitschrift.

[6]  K. Hlavácková-Schindler,et al.  Causality detection based on information-theoretic approaches in time series analysis , 2007 .

[7]  Kazuyuki Aihara,et al.  Identifying hidden common causes from bivariate time series: a method using recurrence plots. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Alessandro Ingrosso,et al.  Inference of causality in epidemics on temporal contact networks , 2016, Scientific Reports.

[9]  Chao Sima,et al.  Inference of Gene Regulatory Networks Using Time-Series Data: A Survey , 2009, Current genomics.

[10]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[11]  George Sugihara,et al.  Detecting Causality in Complex Ecosystems , 2012, Science.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Kazuyuki Aihara,et al.  Detecting Causality from Nonlinear Dynamics with Short-term Time Series , 2014, Scientific Reports.

[14]  G. Rangarajan,et al.  Multiple Nonlinear Time Series with Extended Granger Causality , 2004 .

[15]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[16]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[17]  Jerome P. Reiter Using Statistics to Determine Causal Relationships , 2000, Am. Math. Mon..

[18]  Juha Karvanen,et al.  Estimation of quantile mixtures via L-moments and trimmed L-moments , 2006, Comput. Stat. Data Anal..

[19]  Daniele Marinazzo,et al.  Radial basis function approach to nonlinear Granger causality of time series. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[21]  T. Pham-Gia,et al.  The mean and median absolute deviations , 2001 .

[22]  M Palus,et al.  Synchronization as adjustment of information rates: detection from bivariate time series. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Jonathan F. Donges,et al.  Geometric detection of coupling directions by means of inter-system recurrence networks , 2012, 1301.0934.