Learning Factor Graphs in Polynomial Time and Sample Complexity

We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Importantly, unlike standard maximum likelihood estimation algorithms, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks. In addition to our main result, we show that the sample complexity of parameter learning in graphical models has an O(1) dependence on the number of variables in the model when using the KL-divergence normalized by the number of variables as the performance criterion.

[1]  Richard Scheines,et al.  Causation, Prediction, and Search, Second Edition , 2000, Adaptive computation and machine learning.

[2]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[3]  Michael I. Jordan,et al.  Efficient Stepwise Selection in Decomposable Models , 2001, UAI.

[4]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[5]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[6]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[7]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[8]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[9]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[10]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[11]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[12]  N. Abe On the computational compulexity of approximating probability distributions by probabilistic automata , 1992 .

[13]  C. Ji,et al.  A consistent model selection procedure for Markov random fields based on penalized pseudolikelihood , 1996 .

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[16]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time & Sample Complexity , 2005, UAI.

[17]  B. Gidas Consistency of Maximum Likelihood and Pseudo-Likelihood Estimators for Gibbs Distributions , 1988 .

[18]  Christopher Meek,et al.  Finding a path is harder than finding a tree , 2001, AISTATS.

[19]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[20]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[21]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[22]  David Maxwell Chickering,et al.  Finding Optimal Bayesian Networks , 2002, UAI.

[23]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Sanjoy Dasgupta,et al.  Learning Polytrees , 1999, UAI.

[25]  Naoki Abe,et al.  Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence , 1991, COLT '91.

[26]  W. Gellert,et al.  Probability theory and statistics , 1975 .

[27]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[28]  Jeff A. Bilmes,et al.  PAC-learning Bounded Tree-width Graphical Models , 2004, UAI.

[29]  F. Comets On Consistency of a Class of Estimators for Exponential Families of Markov Random Fields on the Lattice , 1992 .

[30]  Sanjoy Dasgupta,et al.  The Sample Complexity of Learning Fixed-Structure Bayesian Networks , 1997, Machine Learning.

[31]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[32]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[33]  H. Künsch,et al.  Asymptotic Comparison of Estimators in the Ising Model , 1992 .

[34]  J. Besag Efficiency of pseudolikelihood estimation for simple Gaussian fields , 1977 .

[35]  F. Barahona On the computational complexity of Ising spin glass models , 1982 .

[36]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[37]  F. Huang,et al.  Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[38]  Michael I. Jordan Graphical Models , 2003 .

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[41]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[42]  Francesco M. Malvestuto,et al.  Approximating discrete probability distributions with decomposable models , 1991, IEEE Trans. Syst. Man Cybern..