Directory name retrieval over the telephone in the Picasso project

The European project Picasso intends to develop and test several telematics transaction services that will be accessible via the worldwide telephone network. In this framework, ENST works on developing an automated speech recognition system of pronounced and spelled names, for telephone quality speech in French. The recognizer is based on Hidden Markov modeling of speech units using word models for spelled letters and phone models for name pronunciation. Bigram probabilities are introduced at this stage for phonemes and letters, in order to improve the quality of decoding. The directory was built automatically from the list of the names contained in the database, using a grapheme to phoneme converter for the names and rules for spellings, each entry in the directory consisting of several pronunciations and spelling variants. After the acoustic recognition phase, the corresponding entry in the directory is then found using dynamic alignment of symbol sequences, with insertion, deletion and substitution costs determined from the training data to take into account acoustic confusability. As this lexical search is very time consuming for large directories, we present a faster method using pre-selection in a tree-based representation of the lexicon. A rescoring strategy on the 10 best outputs is also evaluated.

[1]  Gérard Chollet,et al.  Directory name retrieval using HMM modeling and robust lexical access , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[3]  Ronald A. Cole,et al.  Real-time, neural network-based, French alphabet recognition with telephone speech , 1993, EUROSPEECH.

[4]  Andreas Spanias,et al.  High-performance alphabet recognition , 1996, IEEE Trans. Speech Audio Process..

[5]  Ronald A. Cole,et al.  English alphabet recognition with telephone speech , 1991, EUROSPEECH.

[6]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[7]  Michael Meyer,et al.  Recognition of spoken and spelled proper names , 1997, EUROSPEECH.

[8]  Mervyn A. Jack,et al.  Phonetic transcription standards for european names (ONOMASTICA) , 1993, EUROSPEECH.

[9]  Jean-François Mari,et al.  An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  François Yvon Prononcer par analogie : motivation, formalisation et evaluation , 1996 .

[11]  Jean Monné,et al.  Speaker-independent spelling recognition over the telephone , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.