Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

Parametric speech modeling is a key issue in various processing applications such as text to speech synthesis, voice morphing, voice conversion and other. Building an adequate parametric model is a complicated problem considering time-varying nature of speech. This paper gives an overview of tools for instantaneous harmonic analysis and shows how it can be applied to stationary, frequency-modulated and quasiperiodic signals in order to extract and manipulate instantaneous pitch, excitation and spectrum envelope.

[1]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2]  Julius O. Smith,et al.  A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications , 1998 .

[3]  Elias Azarov,et al.  Guslar: A framework for automated singing voice correction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Masaaki Honda,et al.  Sinusoidal model based on instantaneous frequency attractors , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Elias Azarov,et al.  Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation , 2013, INTERSPEECH.

[7]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[8]  Hideki Kawahara,et al.  Development of exploratory research tools based on TANDEM-STRAIGHT , 2009 .

[9]  Elias Azarov,et al.  Instantaneous pitch estimation based on RAPT framework , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[10]  W. Bastiaan Kleijn,et al.  A Canonical Representation of Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[14]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[15]  Elias Azarov,et al.  Linear prediction of deterministic components in hybrid signal representation , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.