Multilingual large vocabulary speech recognition: the European SQALE project

Abstract This paper describes the S qale project in which the ARPA large vocabulary evaluation paradigm was adapted to meet the needs of European multilingual speech recognition development. It involved establishing a framework for sharing training and test materials, defining common protocols for training and testing systems, developing systems, running an evaluation and analysing the results. The specifically multilingual issues addressed included the impact of the language on corpora and test set design, transcription issues, evaluation metrics, recognition system design, cross-system and cross-language performance, and results analysis. The project started in December 1993 and finished in September 1995. The paper describes the evaluation framework and the results obtained. The overall conclusions of the project were that the same general approach to recognition system design is applicable to all the languages studied although there were some language specific problems to solve. It was found that the evaluation paradigm used within ARPA could be used within the European context with little difficulty and the consequent sharing amongst the sites of training and test materials and language-specific expertise was highly beneficial.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  J. L. Gauvain Developments in Large Vocabulary Dictation : The LIMSI Nov94 NAB System , 1995 .

[3]  Steve Renals,et al.  Efficient evaluation of the LVCSR search space using the NOWAY decoder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[5]  Herman J. M. Steeneken,et al.  Human benchmarks for speaker independent large vocabulary recognition performance , 1995, EUROSPEECH.

[6]  Xavier L. Aubert,et al.  Improved acoustic-phonetic modeling in philips' dictation system by handling liaisons and multiple pronunciations , 1995, EUROSPEECH.

[7]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[8]  Jean-Luc Gauvain,et al.  Continuous speech dictation in French , 1994, ICSLP.

[9]  Maxine Eskénazi,et al.  Report on the ICSLP satellite workshop on assessment in Kobe (Japan) and visits to several Japanese laboratories working on speech communication, 19-30 November 1990 , 1991, Speech Commun..

[10]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[11]  Herman J. M. Steeneken,et al.  Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: THE SQALE-PROJECT , 1995, EUROSPEECH.

[12]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Steve J. Young,et al.  Large vocabulary multilingual speech recognition using HTK , 1995, EUROSPEECH.

[14]  Steve Young,et al.  WSJCAM0 corpus and recording description , 1994 .

[15]  Mitch Weintraub,et al.  The Hub and Spoke Paradigm for CSR Evaluation , 1994, HLT.

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Lori Lamel,et al.  Issues in Large Vocabulary, Multilingual Speech Recognition , 1995, EUROSPEECH.

[18]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[19]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[22]  Herman J. M. Steeneken,et al.  Objective and diagnostic assessment of (isolated) word recognizers , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[23]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Xavier L. Aubert,et al.  The Philips large-vocabulary recognition system for american English, French, and German , 1995, EUROSPEECH.

[26]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.