Speech-To-Text Conversion in French

Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are studies on distributional properties of French text materials; problems specific to speech-to-text conversion particular of French; studies in phoneme-to-grapheme conversion for continuous, error-free phonemic strings; past work on isolated-word speech-to-text conversion; and more recent work on continuous-speech, speech-to-text conversion. Also demonstrated is the use of phone recognition for both language and speaker identification. The continuous speech-to-text conversion for French is based on a speaker-independent, vocabulary-independent recognizer. In this paper phone recognition and word recognition results are reported evaluating this recognizer on read speech taken from the BREF corpus. The recognizer was trained on over 4 hours of speech from 57 speakers, and tested on sentences from an independent set of 19 speakers. A phone accuracy of 78.7% was obtained using a set of 35 phones. The word accuracy was 88% for a 1139 word lexicon and 86% for a 2716 word lexicon, with a word pair grammar with respective perplexities of 100 and 160. Using a bigram grammar, word accuracies of 85.5% and 81.7% were obtained with 5 K and 20 K word vocabularies, with respective perplexities of 122 and 205.

[1]  Jean-Luc Gauvain,et al.  Experiments on speaker-independent phone recognition using BREF , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[3]  Joseph-Jean Mariani Hamlet: a prototype of a voice activated typewriter , 1987, ECST.

[4]  M. Kuhn,et al.  Improvements in isolated word recognition , 1983 .

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  M. Eskenazi,et al.  The French language database: Defining, planning, and recording a large database , 1984, ICASSP.

[7]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Hsiao-Wuen Hon,et al.  On vocabulary-independent speech modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Claude Montacié,et al.  AR-vector models for free-text speaker recognition , 1992, ICSLP.

[10]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Jean-Luc Gauvain,et al.  Cross-lingual experiments with phone recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Maxine Eskénazi,et al.  Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.

[13]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[14]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[15]  B. Merialdo Phonetic recognition using hidden Markov models and maximum mutual information training , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Ren-Yuan Lyu,et al.  Use of prosodic information to integrate acoustic and linguistic knowledge in continuous Mandarin speech recognition with very large vocabulary , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  L. Boves,et al.  Automatic Speech-to-Text Conversion (Part of the ESPRIT-Project no 291 : 'Linguistic Analysis of the European Languages' , 1985 .

[19]  Stephen E. Levinson,et al.  Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition , 1989, HLT.

[20]  Jean-Luc Gauvain,et al.  A method for connected word recognition and word spotting on a microprocessor , 1982, ICASSP.

[21]  J.-L. Gauvain,et al.  A syllable-based isolated word recognition experiment , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Jean-Luc Gauvain,et al.  Identifying non-linguistic speech features , 1993, EUROSPEECH.

[23]  M. Savic,et al.  Phoneme based speaker verification , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Jean-Luc Gauvain,et al.  A dynamic programming processor for speech recognition , 1989 .

[25]  J.-L. Gauvain,et al.  A dynamic programming processor for speech recognition , 1988, Proceedings of the IEEE 1988 Custom Integrated Circuits Conference.

[26]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[27]  J.-L. Gauvain,et al.  A dynamic time warp VLSI processors for continuous speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[29]  Jean-Luc Gauvain,et al.  Speaker-Independent Phone Recognition Using BREF , 1992, HLT.

[30]  Younès Bennani Speaker identification through a modular connectionist architecture: evaluation on the timit database , 1992, ICSLP.

[31]  Gilles Adda Reconnaissance de grands vocabulaires : une étude syntaxique et lexicale , 1987 .

[32]  Mari Ostendorf,et al.  Fast Search Algorithms for Connected Phone Recognition Using the Stochastic Segment Model , 1990, HLT.

[33]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.