Research on the Classification Ability of Deep Belief Networks on Small and Medium Datasets

Abstract Recent theoretical advances in the learning of deep artificial neural networks have made it possible to overcome a vanishing gradient problem. This limitation has been overcome using a pre-training step, where deep belief networks formed by the stacked Restricted Boltzmann Machines perform unsupervised learning. Once a pre-training step is done, network weights are fine-tuned using regular error back propagation while treating network as a feed-forward net. In the current paper we perform the comparison of described approach and commonly used classification approaches on some well-known classification data sets from the UCI repository as well as on one mid-sized proprietary data set.

[1]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[3]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[4]  Nihat Ay,et al.  Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.

[5]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[6]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[7]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[8]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[9]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[15]  Guido Montufar Mixture Models and Representational Power of RBM ’ s , DBN ’ s and DBM ’ s , 2010 .

[16]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[20]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[21]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[22]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.