Sparse auto-associative neural networks: theory and application to speech recognition

This paper introduces the sparse auto-associative neural network (SAANN) in which the internal hidden layer output is forced to be sparse. This is achieved by adding a sparse regularization term to the original reconstruction error cost function, and updating the parameters of the network to minimize the overall cost. We show applicability of this network to phoneme recognition by extracting sparse hidden layer outputs (used as features) from a network which is trained using perceptual linear prediction (PLP) cepstral coefficients in an unsupervised manner. Experiments with the SAANN features on a state-ofthe-art TIMIT phoneme recognition system show a relative improvement in phoneme error rate of 5.1% over the baseline PLP features.

[1]  Sridhar Krishna Nemala,et al.  Sparse coding for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[3]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[4]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[5]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Hynek Hermansky,et al.  Exploiting contextual information for improved phoneme recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Nelson Morgan,et al.  Learning long-term temporal features in LVCSR using neural networks , 2004, INTERSPEECH.

[8]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[9]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[10]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[11]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[13]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[14]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..