Stochastic Modeling and Quantization of Harmonic Phases in Speech using Wrapped Gaussian Mixture Models

Harmonic sinusoidal representations of speech have proven to be useful in many speech processing tasks. This work focuses on the phase spectra of the harmonics and provides a methodology to analyze and subsequently to model the statistics of the harmonic phases. To do so, we propose the use of a wrapped Gaussian mixture model (WGMM), a model suitable for random variables that belong to circular spaces, and provide an expectation-maximization algorithm for training. The WGMM is then used to construct a phase quantizer. The quantizer is employed in a prototype variable rate narrow-band VoIP sinusoidal codec that is equivalent to iLBC in terms of PESQ-MOS, at ~13 kbps.

[1]  Luís B. Almeida,et al.  Harmonic coding at 4.8 kb/s , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jonas Lindblom,et al.  A sinusoidal voice over packet coder tailored for the frame-erasure channel , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[4]  Allen Gersho,et al.  Enhanced waveform interpolative coding at low bit-rate , 2001, IEEE Trans. Speech Audio Process..

[5]  K. Mardia Statistics of Directional Data , 1972 .

[6]  Andreas Spanias,et al.  A new sinusoidal phase modeling algorithm , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  A. Munk,et al.  Hidden Markov models for circular and linear-circular time series , 2006, Environmental and Ecological Statistics.

[8]  P. Smaragdis,et al.  Learning source trajectories using wrapped-phase hidden Markov models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[9]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[10]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[12]  Barry M. G. Cheetham,et al.  Spectral Envelope and Phase Optimisation for Sinusoidal Speech Coding , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[13]  C. Gobl The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis , 2003 .

[14]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[15]  Yannis Stylianou On the implementation of the harmonic plus noise model for concatenative speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[17]  Jan Skoglund,et al.  iLBC - a linear predictive coder with robustness to packet losses , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[18]  Yannis Stylianou,et al.  Combined estimation/coding of highband spectral envelopes for speech spectrum expansion , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.