Design of fast LVCSR systems

The paper describes the development of fast (less than 10 times real-time) large vocabulary continuous speech recognition (LVCSR) systems based on technology developed for unlimited runtime systems assembled for participation in recent DARPA/NIST LVCSR evaluations. A general system structure for 10 times real-time systems is proposed and two specific systems that have been built for broadcast news (BN) and conversational telephone speech (CTS) recognition are described. The systems were evaluated in the DARPA/NIST April 2003 rich transcription evaluation. Results are reported and contrasted with unlimited runtime systems and previous fast systems.

[1]  Philip C. Woodland,et al.  Speaker clustering using direct maximisation of the MLLR-adapted likelihood , 1998, ICSLP.

[2]  Mark J. F. Gales,et al.  Automatic complexity control for HLDA systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Thomas Hain,et al.  Recent advances in broadcast news transcription , 2003 .

[4]  Steve J. Young,et al.  Speech recognition evaluation: a review of the U.S. CSR and LVCSR programmes , 1998, Comput. Speech Lang..

[5]  Mark J. F. Gales,et al.  CU-HTK April 2002 Switchboard System , 2002 .

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Geoffrey Zweig,et al.  An architecture for rapid decoding of large vocabulary conversational speech , 2003, INTERSPEECH.

[8]  David Graff An overview of Broadcast News corpora , 2002, Speech Commun..

[9]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[10]  Mark J. F. Gales,et al.  Discriminative map for acoustic model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Mark J. F. Gales,et al.  Automatic transcription of conversational telephone speech , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  Thomas Hain,et al.  IMPLICIT PRONUNCIATION MODELLING IN ASR , 2002 .

[13]  Philip C. Woodland,et al.  The development of the HTK Broadcast News transcription system: An overview , 2002, Speech Commun..

[14]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Philip C. Woodland,et al.  Speaker adaptation using lattice-based MLLR , 2001 .

[16]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.