Mutual information, relative entropy, and estimation in the Poisson channel

Let be a nonnegative random variable and let the conditional distribution of a random variable , given , be Poisson , for a parameter . We identify a natural loss function such that: (1) the derivative of the mutual information between and with respect to is equal to the minimum mean loss in estimating based on , regardless of the distribution of ; (2) when is estimated based on by a mismatched estimator that would have minimized the expected loss had , the integral over all values of of the excess mean loss is equal to the relative entropy between and . For a continuous time setting where is a nonnegative stochastic process and the conditional law of , given , is that of a non-homogeneous Poisson process with intensity function , under the same loss function: (1) the minimum mean loss in causal filtering when is equal to the expected value of the minimum mean loss in noncausal filtering (smoothing) achieved with a channel whose parameter is uniformly distributed between 0 and . Bridging the two quantities is the mutual information between and ; (2) this relationship between the mean losses in causal and noncausal filtering holds also in the case where the filters employed are mismatched, i.e., optimized assuming a law on which is not the true one. Bridging the two quantities in this case is the sum of the mutual information and the relative entropy between the true and the mismatched distribution of . Thus, relative entropy quantifies the excess estimation loss due to mismatch in this setting. These results are parallel to those recently found for the Gaussian channel: the I-MMSE relationship of Guo , the relative entropy and mismatched estimation relationship of Verdú, and the relationship between causal and noncasual mismatched estimation of Weissman.

[1]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[2]  A. Shiryaev,et al.  Limit Theorems for Stochastic Processes , 1987 .

[3]  J. Jacod Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales , 1975 .

[4]  Alan Weiss,et al.  Large Deviations For Performance Analysis: Queues, Communication and Computing , 1995 .

[5]  Daniel Pérez Palomar,et al.  Representation of Mutual Information Via Input Estimates , 2007, IEEE Transactions on Information Theory.

[6]  Moshe Zakai,et al.  Some relations between mutual information and estimation error in Wiener space , 2006 .

[7]  P. Dupuis,et al.  The large deviation principle for a general class of queueing systems. I , 1995 .

[8]  J. Grandell Mixed Poisson Processes , 1997 .

[9]  Amos Lapidoth,et al.  On the reliability function of the ideal Poisson channel with noiseless feedback , 1993, IEEE Trans. Inf. Theory.

[10]  A. Shiryayev,et al.  Statistics of Random Processes Ii: Applications , 2000 .

[11]  Tsachy Weissman,et al.  Mutual Information, Relative Entropy, and Estimation in the Poisson Channel , 2012, IEEE Trans. Inf. Theory.

[12]  Antonia Maria Tulino,et al.  Monotonic Decrease of the Non-Gaussianness of the Sum of Independent Random Variables: A Simple Proof , 2006, IEEE Transactions on Information Theory.

[13]  T. Duncan ON THE CALCULATION OF MUTUAL INFORMATION , 1970 .

[14]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[15]  Sergio Verdú,et al.  Randomly spread CDMA: asymptotics via statistical physics , 2005, IEEE Transactions on Information Theory.

[16]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Statistics of random processes , 1977 .

[17]  Sergio Verdú,et al.  Mismatched Estimation and Relative Entropy , 2009, IEEE Transactions on Information Theory.

[18]  Adam Shwartz,et al.  Large Deviations For Performance Analysis , 2019 .

[19]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[20]  Dongning Guo,et al.  Relative entropy and score function: New information-estimation relationships through arbitrary additive perturbation , 2009, 2009 IEEE International Symposium on Information Theory.

[21]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[22]  Haim H. Permuter,et al.  Directed information and causal estimation in continuous time , 2009, 2009 IEEE International Symposium on Information Theory.

[23]  R. Bass,et al.  Review: P. Billingsley, Convergence of probability measures , 1971 .

[24]  Wendell H. Fleming,et al.  Advances in Filtering and Optimal Stochastic Control , 1982 .

[25]  Mokshay M. Madiman,et al.  Compound Poisson Approximation via Information Functionals , 2010, ArXiv.

[26]  Andrea Montanari,et al.  Life Above Threshold: From List Decoding to Area Theorem and MSE , 2004, ArXiv.

[27]  S. Verdú Poisson Communication Theory , 2004 .

[28]  Andrea Montanari,et al.  The Generalized Area Theorem and Some of its Consequences , 2005, IEEE Transactions on Information Theory.

[29]  Moshe Zakai,et al.  On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel , 2004, IEEE Transactions on Information Theory.

[30]  Neri Merhav Optimum Estimation via Gradients of Partition Functions and Information Measures: A Statistical-Mechanical Perspective , 2011, IEEE Transactions on Information Theory.

[31]  P. Brémaud Point Processes and Queues , 1981 .

[32]  Tsachy Weissman,et al.  The Relationship Between Causal and Noncausal Mismatched Estimation in Continuous-Time AWGN Channels , 2010, IEEE Transactions on Information Theory.

[33]  T. Weissman The relationship between causal and non-causal mismatched estimation in continuous-time AWGN channels , 2010, 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo).

[34]  P. Brémaud Point processes and queues, martingale dynamics , 1983 .

[35]  W. Fleming Logarithmic Transformations and Stochastic Control , 1982 .

[36]  Shlomo Shamai,et al.  Mutual Information and Conditional Mean Estimation in Poisson Channels , 2004, IEEE Transactions on Information Theory.

[37]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[38]  Neri Merhav,et al.  A strong version of the redundancy-capacity theorem of universal coding , 1995, IEEE Trans. Inf. Theory.

[39]  T. Kailath,et al.  Radon-Nikodym Derivatives with Respect to Measures Induced by Discontinuous Independent-Increment Processes , 1975 .