RECOGNITION-COMPATIBLE SPEECH COMPRESSION FOR STORED SPEECH

Two important components of a speech archiving system are the compression scheme and the search facility. We investigate two ways of providing these components. The first is to run the recogniser directly from the compressed speech – we show how even with a 2.4kbit/sec codec it is possible to produce good recognition results; but the search is slow. The second is to preprocess the speech and store the extra data in a compressed form along with the speech. In the case of an RNN-HMM hybrid system, the posterior probabilties provide a suitable intermediate data format. Vector quantizing these at just 625 bits/sec enables the search to run many times real-time and still maintain good recognition

[1]  Carl W. Seymour,et al.  A low-bit-rate speech coder using adaptive line spectral frequency prediction 1319 , 1997, EUROSPEECH.

[2]  Ponani S. Gopalakrishnan,et al.  Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Steve Renals,et al.  THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[4]  David A. Heide,et al.  Speech enhancement for bandlimited speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  R.C.F. Tucker Low bit-rate frequency extension coding , 1998 .

[6]  Anthony D. Fagan,et al.  Wideband speech coding in 7.2 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Vassilios Digalakis,et al.  Product-code vector quantization of cepstral parameters for speech recognition over the WWW , 1998, ICSLP.