论文信息 - Transcribing broadcast data using MLP features

Transcribing broadcast data using MLP features

This paper describes incorporating discriminative features from a multi layer perceptron (MLP) into a state-of-the-art Arabic broadcast data transcription system based on cepstral features. The MLP features are based on a recently proposed Bottle-Neck architecture with long-term warped LPTRAP speech representation at the input. It is shown that the previously reported improvements on a development Arabic transcription system carry through to a full system at a state-ofthe-art level. SAT, CMLLR and MLLR adaptation techniques are shown to be useful for both MLP and combined features, though to a lesser degree than for PLPs. Without adaptation, MLP features obtain superior performance to cepstral features in all test conditions, and with adaptation both feature sets give comparable results. Combining the features, either by feature concatenation or system hypotheses, gives significant gains. Gains from MMI model training seem to be additive to the gain coming from discriminative MLP features.

[1] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2] Jonathan G. Fiscus,et al. REDUCED WORD ERROR RATES , 1997 .

[3] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[4] Daniel P. W. Ellis,et al. LP-TRAP: linear predictive temporal patterns , 2004, INTERSPEECH.

[5] Andreas Stolcke,et al. Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[6] Andreas Stolcke,et al. Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[8] Jean-Luc Gauvain,et al. Improved acoustic modeling for transcribing Arabic broadcast data , 2007, INTERSPEECH.

[9] Frantisek Grézl,et al. Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Jean-Luc Gauvain,et al. Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data , 2008, INTERSPEECH.

[11] Jean-Luc Gauvain,et al. On the Use of MLP Features for Broadcast News Transcription , 2008, TSD.