Quantization of cepstral parameters for speech recognition over the World Wide Web

We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). We compare a server-only processing model where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second.

[1]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[2]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yeshwant K. Muthusamy,et al.  Developing web-based speech applications , 1997, EUROSPEECH.

[4]  James R. Glass,et al.  Telephone data collection using the World Wide Web , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Michael Sokolov Speaker verification on the world wide web , 1997, EUROSPEECH.

[6]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[7]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[8]  Chris Weikart,et al.  Deploying speech applications over the web , 1997, EUROSPEECH.

[9]  Samuel Bayer Embedding speech in Web interfaces , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[11]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[12]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..