Phase-Based Methods for Voice Source Analysis

Voice source analysis is an important but difficult issue for speech processing. In this talk, three aspects of voice source analysis recently developed at LIMSI (Orsay, France) and FPMs (Mons, Belgium) are discussed. In a first part, time domain and spectral domain modelling of glottal flow signals are presented. It is shown that the glottal flow can be modelled as an anticausal filter (maximum phase) before the glottal closing, and as a causal filter (minimum phase) after the glottal closing. In a second part, taking advantage of this phase structure, causal and anticausal components of the speech signal are separated according to the location in the Z-plane of the zeros of the Z-Transform (ZZT) of the windowed signal. This method is useful for voice source parameters analysis and source-tract deconvolution. Results of a comparative evaluation of the ZZT and linear prediction for source/tract separation are reported. In a third part, glottal closing instant detection using the phase of the wavelet transform is discussed. A method based on the lines of maximum phase in the time-scale plane is proposed. This method is compared to EGG for robust glottal closing instant analysis.

[1]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[2]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[3]  Nathalie Henrich Bernardoni,et al.  The spectrum of glottal flow models , 2006 .

[4]  B. Doval,et al.  On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. , 2004, The Journal of the Acoustical Society of America.

[5]  R Veldhuis,et al.  A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. , 1998, The Journal of the Acoustical Society of America.

[6]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[7]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[8]  Christophe d'Alessandro,et al.  The voice source as a causal/anticausal linear filter , 2003 .

[9]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[10]  Christophe d'Alessandro,et al.  A method for glottal formant frequency estimation , 2004, INTERSPEECH.

[11]  Stéphane Mallat,et al.  Singularity detection and processing with wavelets , 1992, IEEE Trans. Inf. Theory.

[12]  A comparative evaluation of the zeros of z transform representation for voice source estimation , 2007, INTERSPEECH.

[13]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[14]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[15]  Christophe d'Alessandro,et al.  Zeros of Z-transform representation with application to source-filter separation in speech , 2005, IEEE Signal Processing Letters.

[16]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[17]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[18]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[19]  Christophe d'Alessandro,et al.  Spectral correlates of voice open quotient and glottal flow asymmetry : theory, limits and experimental data , 2001, INTERSPEECH.

[20]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[21]  Christophe d'Alessandro,et al.  Improved differential phase spectrum processing for formant tracking , 2004, INTERSPEECH.

[22]  Nicolas Sturmel,et al.  A SPECTRAL METHOD FOR ESTIMATION OF THE VOICE SPEED QUOTIENT AND EVALUATION USING ELECTROGLOTTOGRAPHY , 2006 .

[23]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[24]  Christophe d'Alessandro,et al.  Appropriate windowing for group delay analysis and roots of z-transform of speech signals , 2004, 2004 12th European Signal Processing Conference.

[25]  Christophe d'Alessandro,et al.  Zeros of z-transform (ZZT) decomposition of speech for source-tract separation , 2004, INTERSPEECH.

[26]  G. Fant Acoustic theory of speech production : with calculations based on X-ray studies of Russian articulations , 1961 .

[27]  Christophe d'Alessandro,et al.  Realtime and accurate musical control of expression in singing synthesis , 2008, Journal on Multimodal User Interfaces.