Segmental vocoder-going beyond the phonetic approach

The problem of very low bit rate segmental speech coding is addressed. The basic units are found automatically in the training database using temporal decomposition, vector quantization and multigrams. They are modelled by HMMs. The coding is based on recognition and synthesis. In single speaker tests, we obtained intelligible and naturally sounding speech at a mean rate of 211.2 b/s. In the end, future extensions of our scheme (diphone-like synthesis and speaker adaptation) as well as possible use of automatically derived units in recognition are discussed.

[1]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[2]  Khalid Choukri Several approaches to speaker adaptation in automatic speech recognizers : Original French title: Quelques approches pour l'adaptation aux locuteurs en reconnaissance automatique de la parole , 1988, Speech Commun..

[3]  George R. Doddington,et al.  A phonetic vocoder , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Frédéric Bimbot,et al.  An evaluation of temporal decomposition , 1991, EUROSPEECH.

[5]  Philip A. Chou,et al.  Variable dimension vector quantization of linear predictive coefficients of speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  R. Pieraccini,et al.  Variable-length sequence modeling: multigrams , 1995, IEEE Signal Processing Letters.

[7]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[8]  Gérard Chollet,et al.  Swiss PolyPhone and PolyVar: Building Databases for Speech Recognition and Speaker Verification , 1996 .

[9]  Isabel Trancoso,et al.  Phonetic vocoding with speaker adaptation , 1997, EUROSPEECH.

[10]  Gérard Chollet,et al.  Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate , 1997, EUROSPEECH.

[11]  K. M. Ponting,et al.  Computational Models of Speech Pattern Processing , 1999, NATO ASI Series.