Speech Analysis Based on Modeling the Effective Voice Source

A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.

[1]  Helmer Strik,et al.  Automatic parametrization of differentiated glottal flow: Comparing methods by means of synthetic flow pulses , 1998 .

[2]  A.P. Benguerel,et al.  Speech analysis , 1981, Proceedings of the IEEE.

[3]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[4]  Tetsuya Shimamura,et al.  Noise‐robust speech analysis using system identification methods , 2003 .

[5]  Hideki Kasuya,et al.  Fast and robust joint estimation of vocal tract and voice source parameters , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  John N. Holmes Formant excitation before and after glottal closure , 1976, ICASSP.

[7]  Nobuhiro Miki,et al.  A speech analysis algorithm which eliminates the influence of pitch using the model reference adaptive system , 1982 .

[8]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[9]  Peter Kabal,et al.  All-pole modelling of mixed excitation signals , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Hiroya Fujisaki,et al.  Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[13]  Keiichi Funaki,et al.  A time varying ARMAX speech modeling with phase compensation using glottal source model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Ashok K. Krishnamurthy Glottal source estimation using a sum-of-exponentials model , 1992, IEEE Trans. Signal Process..

[15]  Chin-Hui Lee,et al.  On robust linear prediction of speech , 1988, IEEE Trans. Acoust. Speech Signal Process..

[16]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Riichiro Mizoguchi,et al.  Analysis of speech signals of short pitch period by a sample-selective linear prediction , 1987, IEEE Trans. Acoust. Speech Signal Process..

[18]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[19]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.