Hardness of parameter estimation in graphical models

We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).

[1]  Allan Sly,et al.  Computational Transition at the Uniqueness Threshold , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  Andrea Montanari,et al.  Computational Implications of Reducing Data to Sufficient Statistics , 2014, ArXiv.

[3]  G. Ziegler Lectures on 0/1-Polytopes , 1999, math/9909177.

[4]  Allan Sly,et al.  The Computational Hardness of Counting in Two-Spin Models on d-Regular Graphs , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[5]  Jean C. Walrand,et al.  Distributed Random Access Algorithm: Scheduling and Congestion Control , 2009, IEEE Transactions on Information Theory.

[6]  Sorin Istrail,et al.  Statistical mechanics, three-dimensionality and NP-completeness: I. Universality of intracatability for the partition function of the Ising model across non-planar surfaces (extended abstract) , 2000, STOC '00.

[7]  Dror Weitz,et al.  Counting independent sets up to the tree threshold , 2006, STOC '06.

[8]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[9]  Christos H. Papadimitriou Turing - a novel about computation , 2003 .

[10]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[11]  Eric Vigoda,et al.  Inapproximability of the Partition Function for the Antiferromagnetic Ising and Hard-Core Models , 2012, Combinatorics, Probability and Computing.

[12]  Martin E. Dyer,et al.  On Counting Independent Sets in Sparse Graphs , 2002, SIAM J. Comput..

[13]  D. Welsh,et al.  On the computational complexity of the Jones and Tutte polynomials , 1990, Mathematical Proceedings of the Cambridge Philosophical Society.

[14]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[15]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16]  Elchanan Mossel,et al.  The Complexity of Distinguishing Markov Random Fields , 2008, APPROX-RANDOM.

[17]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[18]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Sorin Istrail,et al.  Statistical Mechanics, Three-Dimensionality and NP-Completeness: I. Universality of Intractability of the Partition Functions of the Ising Model Across Non-Planar Lattices , 2000, STOC 2000.

[21]  Eric Vigoda,et al.  Fast convergence of the Glauber dynamics for sampling independent sets , 1999, Random Struct. Algorithms.

[22]  Tim Roughgarden,et al.  Marginals-to-Models Reducibility , 2013, NIPS.

[23]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[24]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[25]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[26]  John N. Tsitsiklis,et al.  Hardness of Low Delay Network Scheduling , 2011, IEEE Transactions on Information Theory.

[27]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[28]  J. Borwein,et al.  Convex Functions: Constructions, Characterizations and Counterexamples , 2010 .