Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech

The harmonic representation of speech signals has found many applications in speech processing. This paper presents a novel statistical approach to model the behavior of harmonic phases. Phase information is decomposed into three parts: a minimum phase part, a translation term, and a residual term referred to as dispersion phase. Dispersion phases are modeled by wrapped Gaussian mixture models (WGMMs) using an expectation-maximization algorithm suitable for circular vector data. A multivariate WGMM-based phase quantizer is then proposed and constructed using novel scalar quantizers for circular random variables. The proposed phase modeling and quantization scheme is evaluated in the context of a narrowband harmonic representation of speech. Results indicate that it is possible to construct a variable-rate harmonic codec that is equivalent to iLBC at approximately 13 kbps.

[1]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[2]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[3]  Yannis Stylianou On the implementation of the harmonic plus noise model for concatenative speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Ιωάννης Αγιομυργιαννάκης Sinusoidal coding of speech for voice over IP , 2007 .

[5]  Per Hedelin Phase compensation in all-pole speech analysis , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Harald Pobloth,et al.  Squared error as a measure of perceived phase distortion. , 2003, The Journal of the Acoustical Society of America.

[7]  Jan Skoglund,et al.  iLBC - a linear predictive coder with robustness to packet losses , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[8]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[9]  D. L. Thomson Parametric models of the magnitude/phase spectrum for harmonic speech coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  T. Quatieri,et al.  Phase modelling and its application to sinusoidal transform coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Allen Gersho,et al.  Enhanced waveform interpolative coding at low bit-rate , 2001, IEEE Trans. Speech Audio Process..

[12]  K. Mardia Statistics of Directional Data , 1972 .

[13]  Daniel W. Griffin,et al.  Multi-band excitation vocoder , 1987 .

[14]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[15]  Wei Wang,et al.  An embedded sinusoidal transform codec with measured phases and sampling rate scalability , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Claus Bahlmann,et al.  Directional features in online handwriting recognition , 2006, Pattern Recognit..

[17]  Andreas Spanias,et al.  A new sinusoidal phase modeling algorithm , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[19]  Yingbo Jiang,et al.  Encoding Prototype Waveforms Using A Phase Codebook , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[20]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[21]  Moo Young Kim,et al.  On the perceptual weighting function for phase quantization of speech , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[22]  Sridhar Krishnan,et al.  CN Tower Lightning Current Derivative Heidler Model Analysis and Transmission , 2008, 2008 Second UKSIM European Symposium on Computer Modeling and Simulation.

[23]  Barry M. G. Cheetham,et al.  Spectral Envelope and Phase Optimisation for Sinusoidal Speech Coding , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[24]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[25]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[26]  Slava Shechtman,et al.  Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling , 2005, INTERSPEECH.

[27]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[28]  Doh-Suk Kim On the perceptually irrelevant phase information in sinusoidal representation of speech , 2001, IEEE Trans. Speech Audio Process..

[29]  Luís B. Almeida,et al.  Harmonic coding at 4.8 kb/s , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  Fabrice Plante,et al.  Phase modelling of speech excitation for low bit-rate sinusoidal transform coding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[32]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[33]  P. Smaragdis,et al.  Learning source trajectories using wrapped-phase hidden Markov models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[34]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[35]  Thomas F. Quatieri,et al.  Sinusoidal transform coding , 1988 .

[36]  Yannis Stylianou,et al.  The harmonic model codec (HMC) framework for voIP , 2007, INTERSPEECH.

[37]  Jonas Lindblom,et al.  A sinusoidal voice over packet coder tailored for the frame-erasure channel , 2005, IEEE Transactions on Speech and Audio Processing.

[38]  A. Munk,et al.  Hidden Markov models for circular and linear-circular time series , 2006, Environmental and Ecological Statistics.