论文信息 - Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Hierarchical probabilistic models are able to use a large number of parameters to create a model with a high representation power. However, it is well known that increasing the number of parameters also increases the complexity of the model which leads to a bias-variance trade-off. Although it is a classical problem, the bias-variance trade-off between hidden layers and higher-order interactions have not been well studied. In our study, we propose an efficient inference algorithm for the log-linear formulation of the higher-order Boltzmann machine using a combination of Gibbs sampling and annealed importance sampling. We then perform a bias-variance decomposition to study the differences in hidden layers and higher-order interactions. Our results have shown that using hidden layers and higher-order interactions have a comparable error with a similar order of magnitude and using higher-order interactions produce less variance for smaller sample size.

Mahito Sugiyama | Simon Luo | M. Sugiyama | Simon Luo

[1] Shun-ichi Amari,et al. A Comparison of Descriptive Models of a Single Spike Train by Information-Geometric Measure , 2006, Neural Computation.

[2] Koji Tsuda,et al. Information decomposition on structured space , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[3] Geoffrey E. Hinton,et al. An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[4] N. J. Cohen,et al. Higher-Order Boltzmann Machines , 1986 .

[5] Shun-ichi Amari,et al. Information-Geometric Measure for Neural Spikes , 2002, Neural Computation.

[6] Radford M. Neal. Estimating Ratios of Normalizing Constants Using Linked Importance Sampling , 2005, math/0511216.

[7] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[8] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.

[9] R. Salakhutdinov. Learning and Evaluating Boltzmann Machines , 2008 .

[10] Shun-ichi Amari,et al. Information geometry on hierarchy of probability distributions , 2001, IEEE Trans. Inf. Theory.

[11] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[12] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[13] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Tsuyoshi Murata,et al. {m , 1934, ACML.

[15] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[16] Ruslan Salakhutdinov,et al. Annealing between distributions by averaging moments , 2013, NIPS.

[17] Brian A. Davey,et al. An Introduction to Lattices and Order , 1989 .

[18] Koji Tsuda,et al. Tensor Balancing on Statistical Manifold , 2017, ICML.

[19] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[20] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[21] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[22] Nicolas Le Roux,et al. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[23] Mark Gerstein,et al. Interpretable Sparse High-Order Boltzmann Machines , 2014, AISTATS.

[24] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.