Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation

In this paper, we consider the asymptotic form of the generalization error for the restricted Boltzmann machine in Bayesian estimation. It has been shown that obtaining the maximum pole of zeta functions is related to the asymptotic form of the generalization error for hierarchical learning models (Watanabe, 2001a,b). The zeta function is defined by using a Kullback function. We use two methods to obtain the maximum pole: a new eigenvalue analysis method and a recursive blowing up process. We show that these methods are effective for obtaining the asymptotic form of the generalization error of hierarchical learning models.

[1]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[2]  J. Brasselet Introduction to toric varieties , 2004 .

[3]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[4]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[5]  Sumio Watanabe,et al.  Stochastic complexities of reduced rank regression in Bayesian estimation , 2005, Neural Networks.

[6]  渡邊 澄夫 Algebraic geometry and statistical learning theory , 2009 .

[7]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[8]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[9]  Watanabe Sumio,et al.  Asymptotic Behavior of Free Energy of General Boltzmann Machines in Mean Field Approximation , 2006 .

[10]  H. Hironaka Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .

[11]  Miki Aoyagi,et al.  Resolution of Singularities and the Generalization Error with Bayesian Estimation for Layered Neural Network , 2005 .

[12]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[13]  Sumio Watanabe Algebraic geometrical methods for hierarchical learning machines , 2001, Neural Networks.

[14]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[15]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[16]  Hirotugu Akaike,et al.  Likelihood and the Bayes procedure , 1980 .

[17]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[18]  Katsuyuki Hagiwara,et al.  On the problem of applying AIC to determine the structure of a layered feedforward neural network , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[19]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[20]  Dan Geiger,et al.  Asymptotic Model Selection for Naive Bayesian Networks , 2002, J. Mach. Learn. Res..

[21]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[22]  Sumio Watanabe,et al.  Analysis of Exchange Ratio for Exchange Monte Carlo Method , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[23]  H. Akaike A new look at the statistical model identification , 1974 .

[24]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[25]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Sumio Watanabe,et al.  Singularities in complete bipartite graph-type Boltzmann machines and upper bounds of stochastic complexities , 2005, IEEE Transactions on Neural Networks.

[28]  Kenji Fukumizu,et al.  A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network , 1996, Neural Networks.