Modeling Speech Perception with Restricted Boltzmann Machines

the training of multiple layers of representation. In this paper, we apply the RBM learning algorithm to speech perception. We show that RBMs can be used to achieve good performance in the recognition of isolated spoken digits using a multi-layer deep belief network (consisting of a number of stacked RBMs). This performance, however, appears to depend on the fine-tuning of weights with the supervised back-propagation algorithm. To investigate how central the role of back-propagation, we compare the performance of a number of deep-belief networks using fine-tuning with the performance of the same network architectures without fine-tuning. Furthermore, since one of the main strengths of RBMs is to build up multiple layers of representation, we combine the question of fine-tuning with the question of how beneficial additional layers are for the performance of the networks. To see whether the representations that emerge on higher levels make classification easier, we also apply a simple perception classification to the different levels of the deep-belief networks when it is trained without fine-tuning.