Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers

Automatic Speech Processing (Speech Recognition, Coding, Synthesis, Language Identification, Speaker Verification, Interpreting Telephony, etc.) has progressed to a level which allows its integration in the context of Interactive Voice Servers (IVS). The description of a personal telephone attendant (’Majordome’) focuses on some of the issues in the development of IVS. In particular, users should be allowed to dialogue with automatic systems over the telephone in their native language. To achieve this goal, we propose an approach called ALISP (Automatic Language Independent Speech Processing). The needs for ALISP are justified and some of the corresponding tools are described. Applications to very low bit-rate coders, automatic speech recognition and speaker verification illustrate our proposal.

[1]  Jean Monné,et al.  Speaker-independent spelling recognition over the telephone , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kuldip K. Paliwal,et al.  Speech recognition based on acoustically derived segment units , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[4]  Frédéric Bimbot,et al.  An evaluation of temporal decomposition , 1991, EUROSPEECH.

[5]  Mervyn A. Jack,et al.  Phonetic transcription standards for european names (ONOMASTICA) , 1993, EUROSPEECH.

[6]  Gérard Chollet,et al.  Directory name retrieval using HMM modeling and robust lexical access , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .

[8]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[9]  Howard C. Nusbaum,et al.  Pronounce : a program for pronunciation by analogy , 1991 .

[10]  Jean-François Mari,et al.  An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  H. Bourlard,et al.  Link between Markov Models and Multi-layer Perceptoron , 1990 .

[12]  J. Hennebert,et al.  Phoneme based text-prompted speaker verification with multi-layer perceptrons , 1998 .

[13]  Andreas Spanias,et al.  High-performance alphabet recognition , 1996, IEEE Trans. Speech Audio Process..

[14]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  François Yvon Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks , 1996, ArXiv.

[16]  Ronald A. Cole,et al.  Real-time, neural network-based, French alphabet recognition with telephone speech , 1993, EUROSPEECH.

[17]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[18]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[19]  Richard S. Bird,et al.  An introduction to the theory of lists , 1987 .

[20]  Gérard Chollet,et al.  Segmental vocoder-going beyond the phonetic approach , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Jean Hennebert,et al.  Text-prompted speaker verification experiments with phoneme specific MLPs , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Nick G. Kingsbury,et al.  Hidden Markov tree modeling of complex wavelet transforms , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23]  R. I. Damper,et al.  Stochastic phonographic transduction for English , 1996, Comput. Speech Lang..

[24]  Gérard Chollet,et al.  Directory name retrieval over the telephone in the Picasso project , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[25]  A.P.J. van den Bosch,et al.  Learning to pronounce written words : a study in inductive language learning , 1997 .

[26]  Jesper Ø. Olsen A two-stage procedure for phone based speaker verification , 1997, Pattern Recognit. Lett..

[27]  Tony Vitale,et al.  An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer , 1991, Comput. Linguistics.

[28]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[30]  Ronald A. Cole,et al.  English alphabet recognition with telephone speech , 1991, EUROSPEECH.

[31]  Michael Meyer,et al.  Recognition of spoken and spelled proper names , 1997, EUROSPEECH.

[32]  Frédéric Bimbot,et al.  Introducing statistical dependencies and structural constraints in variable-length sequence models , 1996, ICGI.

[33]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.