Investigation into bottle-neck features for meeting speech recognition

This work investigates into recently proposed Bottle-Neck features for ASR. The bottle-neck ANN structure is imported into Split Context architecture gaining significant WER reduction. Further, Universal Context architecture was developed which simplifies the system by using only one universal ANN for all temporal splits. Significant WER reduction can be obtained by applying fMPE on top of our BN features as a technique for discriminative feature extraction and further gain is also obtained by retraining model parameters using MPE criterion. The results are reported on meeting data from RT07 evaluation.

[1]  Daniel Povey,et al.  Improvements to fMPE for discriminative training of features , 2005, INTERSPEECH.

[2]  Jan Cernocký,et al.  Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[6]  Nelson Morgan,et al.  Learning long-term temporal features in LVCSR using neural networks , 2004, INTERSPEECH.

[7]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Richard M. Schwartz,et al.  Recent progress on the discriminative region-dependent transform for speech feature extraction , 2006, INTERSPEECH.

[9]  Nelson Morgan,et al.  Tonotopic multi-layered perceptron: a neural network for learning long-term temporal features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Hynek Hermansky,et al.  Local averaging and differentiating of spectral plane for TRAP-based ASR , 2003, INTERSPEECH.

[12]  Pavel Matejka,et al.  Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[13]  Andreas Stolcke,et al.  The ICSI-SRI Spring 2006 Meeting Recognition System , 2006, MLMI.