Learning Tree Structures from Noisy Data

We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.

[1]  David E. Tyler,et al.  Robust estimators for nondecomposable elliptical graphical models , 2013, 1302.5251.

[2]  Tetsuya Takaishi,et al.  Multiple Time Series Ising Model for Financial Market Simulations , 2015, 1611.08088.

[3]  G. Bennett Probability Inequalities for the Sum of Independent Random Variables , 1962 .

[4]  Roger G. Melko,et al.  Deep Learning the Ising Model Near Criticality , 2017, J. Mach. Learn. Res..

[5]  Seyed Abolfazl Motahari,et al.  Learning of Tree-Structured Gaussian Graphical Models on Distributed Data Under Communication Constraints , 2019, IEEE Transactions on Signal Processing.

[6]  Vincent Y. F. Tan,et al.  Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates , 2010, J. Mach. Learn. Res..

[7]  Qiang Ji,et al.  A Coupled Hidden Markov Random Field model for simultaneous face clustering and tracking in videos , 2017, Pattern Recognit..

[8]  R. Douc,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR FOR GENERAL HIDDEN MARKOV MODELS , 2009, 0912.4480.

[9]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[10]  Chuan Li,et al.  Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Larry Wasserman,et al.  Forest Density Estimation , 2010, J. Mach. Learn. Res..

[12]  T. Bossomaier,et al.  Information flow in a kinetic Ising model peaks in the disordered phase. , 2013, Physical review letters.

[13]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiaoxiao Li,et al.  Deep Learning Markov Random Field for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Greedy Methods , 2011, NIPS.

[16]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[17]  Guy Bresler,et al.  Learning a Tree-Structured Ising Model in Order to Make Predictions , 2016, The Annals of Statistics.

[18]  Bin Wang,et al.  Learning Trans-Dimensional Random Fields with Applications to Language Modeling , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ramon van Handel,et al.  Observability and nonlinear filtering , 2007, 0708.3412.

[20]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[21]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[22]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[23]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[24]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[25]  Maxim Raginsky,et al.  Strong Data Processing Inequalities and $\Phi $ -Sobolev Inequalities for Discrete Channels , 2014, IEEE Transactions on Information Theory.

[26]  Trevor Hastie,et al.  Applications of the lasso and grouped lasso to the estimation of sparse graphical models , 2010 .

[27]  Percy Liang,et al.  Estimating Latent-Variable Graphical Models using Moments and Likelihoods , 2014, ICML.

[28]  Athina P. Petropulu,et al.  Grid Based Nonlinear Filtering Revisited: Recursive Estimation & Asymptotic Optimality , 2016, IEEE Transactions on Signal Processing.

[29]  Martin Bilodeau Graphical lassos for meta-elliptical distributions , 2014 .

[30]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[31]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[32]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[33]  Vincent Y. F. Tan,et al.  Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures , 2009, IEEE Transactions on Signal Processing.

[34]  Didier Sornette,et al.  Self-organizing Ising model of financial markets , 2005, physics/0503230.

[35]  David Edwards,et al.  Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.

[36]  Yuehua Wu,et al.  TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL , 2012 .

[37]  Mikhail Prokopenko,et al.  Criticality and Information Dynamics in Epidemiological Models , 2017, Entropy.

[38]  Akira Sasaki,et al.  Statistical Mechanics of Population: The Lattice Lotka-Volterra Model , 1992 .

[39]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[40]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.

[41]  Maxim Sviridenko,et al.  Concentration and moment inequalities for polynomials of independent random variables , 2012, SODA.

[42]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[43]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[44]  M. Drton,et al.  Estimation of High-Dimensional Graphical Models Using Regularized Score Matching. , 2015, Electronic journal of statistics.

[45]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[46]  Jonathan Le Roux,et al.  Deep unfolding for multichannel source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Mathias Drton,et al.  High-dimensional Ising model selection with Bayesian information criteria , 2014, 1403.3374.

[48]  S. Fortunato,et al.  Statistical physics of social dynamics , 2007, 0710.3256.

[49]  S. Torquato Toward an Ising model of cancer and beyond , 2010, Physical biology.

[50]  Ruijiang Li,et al.  Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO , 2017, BMC Bioinformatics.

[51]  L. Isserlis ON A FORMULA FOR THE PRODUCT-MOMENT COEFFICIENT OF ANY ORDER OF A NORMAL FREQUENCY DISTRIBUTION IN ANY NUMBER OF VARIABLES , 1918 .

[52]  Sanjay Shakkottai,et al.  Improved Greedy Algorithms for Learning Graphical Models , 2015, IEEE Transactions on Information Theory.

[53]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[54]  Yihong Wu,et al.  Strong data-processing inequalities for channels and Bayesian networks , 2015, 1508.06025.

[55]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[56]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[57]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[58]  Marloes H. Maathuis,et al.  Structure Learning in Graphical Modeling , 2016, 1606.02359.

[59]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[60]  Paris Smaragdis,et al.  Single channel source separation using smooth Nonnegative Matrix Factorization with Markov Random Fields , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[61]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[62]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[63]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[64]  Hao Wang,et al.  Bayesian Graphical Lasso Models and Efficient Posterior Computation , 2012 .

[65]  Anima Anandkumar,et al.  Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees , 2012, The Annals of Statistics.

[66]  Alain Hauser,et al.  High-dimensional consistency in score-based and hybrid structure learning , 2015, The Annals of Statistics.

[67]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[68]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[69]  Mathias Drton,et al.  Robust graphical modeling of gene networks using classical and alternative t-distributions , 2010, 1009.3669.

[70]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[71]  H. Zou,et al.  High dimensional semiparametric latent graphical model for mixed data , 2014, 1404.7236.

[72]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[73]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[74]  D. Vogel,et al.  Elliptical graphical modelling , 2011, 1506.04321.

[75]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[76]  Michael Banf,et al.  Enhancing gene regulatory network inference through data integration with markov random fields , 2017, Scientific Reports.

[77]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..