A large-deviation analysis for the maximum likelihood learning of tree structures

The problem of maximum-likelihood learning of the structure of an unknown discrete distribution from samples is considered when the distribution is Markov on a tree. Large-deviation analysis of the error in estimation of the set of edges of the tree is performed. Necessary and sufficient conditions are provided to ensure that this error probability decays exponentially. These conditions are based on the mutual information between each pair of variables being distinct from that of other pairs. The rate of error decay, or error exponent, is derived using the large-deviation principle. The error exponent is approximated using Euclidean information theory and is given by a ratio, to be interpreted as the signal-to-noise ratio (SNR) for learning. Numerical experiments show the SNR approximation is accurate.

[1]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  Terry J. Wagner,et al.  Consistency of an estimate of tree-dependent probability distributions (Corresp.) , 1973, IEEE Trans. Inf. Theory.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[7]  Michael I. Jordan Graphical Models , 2003 .

[8]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[9]  Marcus Hutter,et al.  Distribution of Mutual Information , 2001, NIPS.

[10]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[11]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[12]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[13]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[14]  J. N. Laneman On the Distribution of Mutual Information , 2006 .

[15]  S. Si LARGE DEVIATION FOR THE EMPIRICAL CORRELATION COEFFICIENT OF TWO GAUSSIAN RANDOM VARIABLES , 2007 .

[16]  B. Schölkopf,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2007 .

[17]  Lizhong Zheng,et al.  Euclidean Information Theory , 2008, 2008 IEEE International Zurich Seminar on Communications.