Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp., tree) model is the hardest (resp., easiest) to learn using the proposed algorithm in terms of error rates for structure learning.

[1]  L. Finesso,et al.  The optimal Error Exponent for Markov Order Estimation , 1993, Proceedings. IEEE International Symposium on Information Theory.

[2]  Markus Svensén,et al.  Beyond atopy: multiple patterns of sensitization in relation to asthma in a birth cohort study. , 2010, American journal of respiratory and critical care medicine.

[3]  Vincent Y. F. Tan,et al.  Error exponents for composite hypothesis testing of Markov forest distributions , 2010, 2010 IEEE International Symposium on Information Theory.

[4]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[5]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[6]  Lang Tong,et al.  A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures , 2009, IEEE Transactions on Information Theory.

[7]  Lizhong Zheng,et al.  Euclidean Information Theory , 2008, 2008 IEEE International Zurich Seminar on Communications.

[8]  I. Csiszár,et al.  The consistency of the BIC Markov order estimator , 2000 .

[9]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[10]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[11]  S. Varadhan,et al.  Large deviations , 2019, Graduate Studies in Mathematics.

[12]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[13]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[14]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[15]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[16]  J. Lafferty,et al.  Tree Density Estimation , 2010 .

[17]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[18]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[19]  Venkat Chandrasekaran,et al.  Learning Markov Structure by Maximum Entropy Relaxation , 2007, AISTATS.

[20]  Michael I. Jordan,et al.  Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..

[21]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[22]  Neri Merhav,et al.  The estimation of the model order in exponential families , 1989, IEEE Trans. Inf. Theory.

[23]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Michael I. Jordan Graphical Models , 1998 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Gary Simon,et al.  Additivity of Information in Exponential Family Probability Laws , 1973 .

[28]  Stéphane Boucheron,et al.  Optimal error exponents in hidden Markov models order estimation , 2003, IEEE Trans. Inf. Theory.

[29]  Vincent Y. F. Tan,et al.  Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures , 2009, IEEE Transactions on Signal Processing.

[30]  Larry Wasserman,et al.  Forest Density Estimation , 2010, J. Mach. Learn. Res..

[31]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[32]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[33]  R. Prim Shortest connection networks and some generalizations , 1957 .

[34]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[35]  Martin Aigner,et al.  Proofs from THE BOOK , 1998 .

[36]  Robert G. Gallager,et al.  Claude E. Shannon: A retrospective on his life, work, and impact , 2001, IEEE Trans. Inf. Theory.

[37]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[38]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[39]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[40]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[41]  Imre Csiszár,et al.  Context tree estimation for not necessarily finite memory processes, via BIC and MDL , 2005, IEEE Transactions on Information Theory.

[42]  A. Woodcock,et al.  The National Asthma Campaign Manchester Asthma and Allergy Study , 2002, Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology.

[43]  Amiel Feinstein,et al.  Transmission of Information. , 1962 .

[44]  Eytan Domany,et al.  On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network , 2006, UAI.

[45]  Terry J. Wagner,et al.  Consistency of an estimate of tree-dependent probability distributions (Corresp.) , 1973, IEEE Trans. Inf. Theory.