Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate

Restricted Boltzmann Machines (RBMs) are a type of probability model over the Boolean cube {-1, 1}n that have recently received much attention. We establish the intractability of two basic computational tasks involving RBMs, even if only a coarse approximation to the correct output is required. We first show that assuming P ≠ NP, for any fixed positive constant K (which may be arbitrarily large) there is no polynomial-time algorithm for the following problem: given an n-bit input string x and the parameters of a RBM M, output an estimate of the probability assigned to x by M that is accurate to within a multiplicative factor of eKn. This hardness result holds even if the parameters of M are constrained to be at most Ψ(n) for any function Ψ(n) that grows faster than linearly, and if the number of hidden nodes of M is at most n. We then show that assuming RP ≠ NP, there is no polynomial-time randomized algorithm for the following problem: given the parameters of an RBM M, generate a random example from a probability distribution whose total variation distance from the distribution defined by M is at most 1/12.

[1]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[2]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[3]  Kazuyuki Tanaka,et al.  Approximate Learning Algorithm for Restricted Boltzmann Machines , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.

[4]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[5]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[6]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[7]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[8]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[9]  Elchanan Mossel,et al.  The Complexity of Distinguishing Markov Random Fields , 2008, APPROX-RANDOM.

[10]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[11]  Noga Alon,et al.  Approximating the cut-norm via Grothendieck's inequality , 2004, STOC '04.

[12]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[13]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[14]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[15]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.