Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures

The problem of learning tree-structured Gaussian graphical models from independent and identically distributed (i.i.d.) samples is considered. The influence of the tree structure and the parameters of the Gaussian distribution on the learning rate as the number of samples increases is discussed. Specifically, the error exponent corresponding to the event that the estimated tree structure differs from the actual unknown tree structure of the distribution is analyzed. Finding the error exponent reduces to a least-squares problem in the very noisy learning regime. In this regime, it is shown that the extremal tree structure that minimizes the error exponent is the star for any fixed set of correlation coefficients on the edges of the tree. If the magnitudes of all the correlation coefficients are less than 0.63, it is also shown that the tree structure that maximizes the error exponent is the Markov chain. In other words, the star and the chain graphs represent the hardest and the easiest structures to learn in the class of tree-structured Gaussian graphical models. This result can also be intuitively explained by correlation decay: pairs of nodes which are far apart, in terms of graph distance, are unlikely to be mistaken as edges by the maximum-likelihood estimator in the asymptotic regime.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Frank Harary,et al.  Graph Theory , 2016 .

[3]  Terry J. Wagner,et al.  Consistency of an estimate of tree-dependent probability distributions (Corresp.) , 1973, IEEE Trans. Inf. Theory.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[7]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[8]  Esther M. Arkin,et al.  On the maximum scatter TSP , 1997, SODA '97.

[9]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[10]  Michael I. Jordan Graphical Models , 1998 .

[11]  Irene Sciriha,et al.  On the nullity of line graphs of trees , 2001, Discret. Math..

[12]  Sergio Verdú,et al.  Spectral efficiency in the wideband regime , 2002, IEEE Trans. Inf. Theory.

[13]  Stephen P. Boyd,et al.  Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..

[14]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[15]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[16]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[17]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[18]  Eytan Domany,et al.  On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network , 2006, UAI.

[19]  S. Si LARGE DEVIATION FOR THE EMPIRICAL CORRELATION COEFFICIENT OF TWO GAUSSIAN RANDOM VARIABLES , 2007 .

[20]  B. Schölkopf,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2007 .

[21]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[22]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[23]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[24]  Lizhong Zheng,et al.  Euclidean Information Theory , 2008, 2008 IEEE International Zurich Seminar on Communications.

[25]  Andrea Montanari,et al.  Which graphical models are difficult to learn? , 2009, NIPS.

[26]  Lang Tong,et al.  A large-deviation analysis for the maximum likelihood learning of tree structures , 2009, 2009 IEEE International Symposium on Information Theory.

[27]  S. Varadhan,et al.  Large deviations , 2019, Graduate Studies in Mathematics.

[28]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.