论文信息 - Optimizing bottle-neck features for lvcsr

Optimizing bottle-neck features for lvcsr

This work continues in development of the recently proposed bottle-neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology - number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.

Frantisek Grézl | Petr Fousek | P. Fousek | F. Grézl

[1] Hynek Hermansky,et al. TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[2] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[3] Hynek Hermansky,et al. Band-independent speech-event categories for TRAP based ASR , 2003, INTERSPEECH.

[4] Hynek Hermansky,et al. Local averaging and differentiating of spectral plane for TRAP-based ASR , 2003, INTERSPEECH.

[5] Pavel Matejka,et al. Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[6] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[7] Lukás Burget,et al. The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[8] Andreas Stolcke,et al. The ICSI-SRI Spring 2006 Meeting Recognition System , 2006, MLMI.

[9] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10] Jean-Luc Gauvain,et al. Improved acoustic modeling for transcribing Arabic broadcast data , 2007, INTERSPEECH.