Optimal Causal Inference

We consider an information-theoretic objective function for statistical modeling of time series that embodies a parametrized trade-off between the predictive power of a model and the model’s complexity. We study two distinct cases of optimal causal inference, which we call optimal causal filtering (OCF) and optimal causal estimation (OCE). OCF corresponds to the ideal case of having infinite data. We show that OCF leads to the exact causal architecture of a stochastic process, in the limit in which the trade-off parameter tends to zero, thereby emphasizing prediction. Specifically, the filtering method reconstructs exactly the hidden, causal states. More generally, we establish that the method leads to a graded model-complexity hierarchy of approximations to the causal architecture. We show for nonideal cases with finite data (OCE) that the correct number of states can be found by adjusting for statistical fluctuations in probability estimates.

[1]  J. Crutchfield The calculi of emergence: computation, dynamics and induction , 1994 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  N. Packard,et al.  Symbolic dynamics of noisy chaos , 1983 .

[4]  J. Crutchfield,et al.  Fluctuation Spectroscopy , 1993 .

[5]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[6]  P. Holmes,et al.  Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields , 1983, Applied Mathematical Sciences.

[7]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[8]  William Bialek,et al.  Geometric Clustering Using the Information Bottleneck Method , 2003, NIPS.

[9]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[10]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[11]  Susanne Still Active Learning and Optimal Predictions , 2005 .

[12]  J. D. Farmer,et al.  Order within chaos , 1984 .

[13]  James P. Crutchfield,et al.  Structure or Noise? , 2007, ArXiv.

[14]  B. Weiss Subshifts of finite type and sofic systems , 1973 .

[15]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[16]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[17]  J. Crutchfield,et al.  Thermodynamic depth of causal states: Objective complexity via minimal representations , 1999 .

[18]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[19]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[20]  Mw Hirsch,et al.  Chaos In Dynamical Systems , 2016 .

[21]  James P. Crutchfield,et al.  Equations of Motion from a Data Series , 1987, Complex Syst..

[22]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[23]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[24]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[25]  J. Sprott Chaos and time-series analysis , 2001 .

[26]  Robert Shaw,et al.  The Dripping Faucet As A Model Chaotic System , 1984 .

[27]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[28]  James P. Crutchfield,et al.  Reductions of Hidden Information Sources , 2004 .