Estimating the Directed Information and Testing for Causality

The problem of estimating the directed information rate between two discrete processes (Xn) and (Yn) via the plug-in (or maximum-likelihood) estimator is considered. When the joint process ((Xn, Yn)) is a Markov chain of a given memory length, the plug-in estimator is shown to be asymptotically Gaussian and to converge at the optimal rate O(1/√n) under appropriate conditions; this is the first estimator that has been shown to achieve this rate. An important connection is drawn between the problem of estimating the directed information rate and that of performing a hypothesis test for the presence of causal influence between the two processes. Under fairly general conditions, the null hypothesis, which corresponds to the absence of causal influence, is equivalent to the requirement that the directed information rate be equal to zero. In that case, a finer result is established, showing that the plug-in converges at the faster rate O(1/n) and that it is asymptotically χ2-distributed. This is proved by showing that this estimator is equal to (a scalar multiple of) the classical likelihood ratio statistic for the above hypothesis test. Finally, it is noted that these results facilitate the design of an actual likelihood ratio test for the presence or absence of causal influence.

[1]  Olivier J. J. Michel,et al.  The relation between Granger causality and directed information theory: a review , 2012, Entropy.

[2]  Y.-H. Kim,et al.  A Coding Theorem for a Class of Stationary Channels with Feedback , 2007, 2007 IEEE International Symposium on Information Theory.

[3]  Gerhard Kramer,et al.  Information Networks With In-Block Memory , 2012, IEEE Transactions on Information Theory.

[4]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[5]  T. Bossomaier,et al.  Transfer entropy as a log-likelihood ratio. , 2012, Physical review letters.

[6]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[7]  Ioannis Kontoyiannis,et al.  Asymptotic Recurrence and Waiting Times for Stationary Processes , 1998 .

[8]  Andrea J. Goldsmith,et al.  The capacity region of the degraded finite-state broadcast channel , 2008, 2008 IEEE Information Theory Workshop.

[9]  Tsachy Weissman,et al.  Rate-distortion in near-linear time , 2008, 2008 IEEE International Symposium on Information Theory.

[10]  Tobias J. Oechtering,et al.  Stabilization of Linear Systems Over Gaussian Networks , 2013, IEEE Transactions on Automatic Control.

[11]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[12]  Haim H. Permuter,et al.  Universal Estimation of Directed Information , 2010, IEEE Transactions on Information Theory.

[13]  Andrea J. Goldsmith,et al.  Finite State Channels With Time-Invariant Deterministic Feedback , 2006, IEEE Transactions on Information Theory.

[14]  Haim H. Permuter,et al.  Feedback Capacity of the Compound Channel , 2007, IEEE Transactions on Information Theory.

[15]  Yanjun Han,et al.  Minimax estimation of discrete distributions , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[16]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[17]  Todd P. Coleman,et al.  Estimating the directed information to infer causal relationships in ensemble neural spike train recordings , 2010, Journal of Computational Neuroscience.

[18]  Jan Haskovec,et al.  A note on the consensus finding problem in communication networks with switching topologies , 2014, 1405.0971.

[19]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[20]  R. Gray,et al.  Asymptotically Mean Stationary Measures , 1980 .

[21]  Ramji Venkataramanan,et al.  Source Coding With Feed-Forward: Rate-Distortion Theorems and Error Exponents for a General Source , 2007, IEEE Transactions on Information Theory.

[22]  Y. Ogata,et al.  The weak convergence of the likelihood ratio random fields for Markov observations , 1977 .

[23]  M. Schervish Theory of Statistics , 1995 .

[24]  Todd P. Coleman,et al.  Robust directed tree approximations for networks of stochastic processes , 2013, 2013 IEEE International Symposium on Information Theory.

[25]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[26]  Victor M. Preciado,et al.  Detection and isolation of link failures under the agreement protocol , 2013, 52nd IEEE Conference on Decision and Control.

[27]  Sekhar Tatikonda,et al.  Control under communication constraints , 2004, IEEE Transactions on Automatic Control.

[28]  Patrick Billingsley,et al.  Statistical inference for Markov processes , 1961 .

[29]  Wojciech Szpankowski,et al.  Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates , 2007, EURASIP J. Bioinform. Syst. Biol..

[30]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[31]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[32]  Zaher Dawy,et al.  Genomic analysis using methods from information theory , 2004, Information Theory Workshop.

[33]  Haim H. Permuter,et al.  Directed Information, Causal Estimation, and Communication in Continuous Time , 2009, IEEE Transactions on Information Theory.

[34]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[35]  Deniz Gündüz,et al.  Lossless Transmission of Correlated Sources over a Multiple Access Channel with Side Information , 2007, 2007 Data Compression Conference (DCC'07).

[36]  C. Gouriéroux,et al.  Kullback Causality Measures , 1987 .

[37]  H. Marko,et al.  The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[38]  Haim H. Permuter,et al.  Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing , 2009, IEEE Transactions on Information Theory.

[39]  Milan S. Derpich,et al.  A Characterization of the Minimal Average Data Rate That Guarantees a Given Closed-Loop Performance Level , 2014, IEEE Transactions on Automatic Control.

[40]  Erik G. Miller A new class of entropy estimators for multi-dimensional densities , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[42]  J. D. Engel,et al.  Using directed information to build biologically relevant influence networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.