Causal Inference with Rare Events in Large-Scale Time-Series Data

Large-scale observational datasets are prevalent in many areas of research, including biomedical informatics, computational social science, and finance. However, our ability to use these data for decision-making lags behind our ability to collect and mine them. One reason for this is the lack of methods for inferring the causal impact of rare events. In cases such as the monitoring of continuous data streams from intensive care patients, social media, or finance, though, rare events may in fact be the most important ones-signaling critical changes in a patient's status or trading volume. While prior data mining approaches can identify or predict rare events, they cannot determine their impact, and probabilistic causal inference methods fail to handle inference with infrequent events. Instead, we develop a new approach to finding the causal impact of rare events that leverages the large amount of data available to infer a model of a system's functioning and evaluates how rare events explain deviations from usual behavior. Using simulated data, we evaluate the approach and compare it against others, demonstrating that it can accurately infer the effects of rare events.

[1]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[2]  Samantha Kleinberg,et al.  A Logic for Causal Inference in Time Series with Discrete and Continuous Variables , 2011, IJCAI.

[3]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[4]  Joseph Y. Halpern,et al.  Causes and explanations: A structural-model approach , 2000 .

[5]  Bradley Efron,et al.  Large-scale inference , 2010 .

[6]  Michael Eichler,et al.  Graphical Modeling of Dynamic Relationships in Multivariate Time Series , 2006 .

[7]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[8]  Samantha Kleinberg,et al.  Causality, Probability, and Time , 2012 .

[9]  Balasubramanian Narasimhan,et al.  Computes Local False Discovery Rates , 2015 .

[10]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[11]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[12]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[13]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[14]  Vipin Kumar,et al.  Mining needle in a haystack: classifying rare classes via two-phase rule induction , 2001, SIGMOD '01.

[15]  Daniel M. Hausman,et al.  Causal Relata: Tokens, Types, or Variables? , 2005 .

[16]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[17]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[18]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[19]  Michael Eichler,et al.  Causal Reasoning in Graphical Time Series Models , 2007, UAI.

[20]  Stephan Merz,et al.  Model Checking , 2000 .

[21]  Keith A. Markus,et al.  Making Things Happen: A Theory of Causal Explanation , 2007 .

[22]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..

[23]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[24]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[25]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[26]  Heinz W. Schmidt,et al.  A Model-Oriented Framework for Runtime Monitoring of Nonfunctional Properties , 2005, QoSA/SOQUA.

[27]  Illtyd Trethowan Causality , 1938 .

[28]  Ellery Eells Probabilistic causality: Bibliography , 1991 .

[29]  A. Alexandrova The British Journal for the Philosophy of Science , 1965, Nature.

[30]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[31]  Amedeo Napoli,et al.  Towards Rare Itemset Mining , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[32]  Marco Zaffalon,et al.  Probability and time , 2013, Artif. Intell..

[33]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[34]  Bud Mishra,et al.  The Temporal Logic of Causal Structures , 2009, UAI.

[35]  Mark Hopkins,et al.  Causality and Counterfactuals in the Situation Calculus , 2007, J. Log. Comput..