The Philips Research system for continuous-speech recognition

This paper gives an overview of the Philips Research system for continuous-speech recognition. The recognition architecture is based on an integrated statistical approach. The system has been successfully applied to various tasks in American English and German, ranging from small vocabulary tasks to very large vocabulary tasks and from recognition only to speech understanding. Here, we concentrate on phoneme-based continuous-speech recognition for large vocabulary recognition as used for dictation, which covers a significant part of our research work on speech recognition. We describe this task and report on experimental results. In order to allow a comparison with the performance of other systems, a section with an evaluation on the standard North American Business news (NAB2) task (dictation of American English newspaper text) is supplied.

[1]  Andreas Noll,et al.  A data-driven organization of the dynamic programming beam search for continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[3]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[4]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[5]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[6]  Hermann Ney,et al.  Modeling and search in continuous speech recognition , 1993, EUROSPEECH.

[7]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[9]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Hermann Ney,et al.  On smoothing techniques for bigram-based natural language modelling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Hermann Ney,et al.  Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Hermann Ney,et al.  Design and use of speech recognition algorithms for a mobile radio telephone , 1993, Speech Commun..

[16]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[18]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[20]  M. Oerder,et al.  Database query generation from spoken sentences , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[21]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[22]  Hermann Ney,et al.  A look-ahead search technique for large vocabulary continuous speech recognition , 1991, EUROSPEECH.

[23]  Hermann Ney,et al.  Data driven search organization for continuous speech recognition , 1992, IEEE Trans. Signal Process..

[24]  Mitch Weintraub,et al.  The Hub and Spoke Paradigm for CSR Evaluation , 1994, HLT.

[25]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[26]  Hermann Ney,et al.  An Overview of the Philips Research System for Large Vocabulary Continuous Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[27]  Lori Lamel,et al.  The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  Volker Steinbiss,et al.  Experience with the Philips automatic train timetable information system , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[30]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[32]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[34]  Harald Aust,et al.  A realtime prototype of an automatic inquiry system , 1994, ICSLP.

[35]  Aaron E. Rosenberg,et al.  Word juncture modeling using phonological rules for HMM-based continuous speech recognition , 1991 .

[36]  Stefan Dobler,et al.  Speech recognition in the noisy car environment , 1989, Speech Commun..

[37]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[38]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.