A Novel Method of Glottal Inverse Filtering

This paper presents a new technique for glottal inverse filtering using a distributed model of the vocal tract. A discrete state space model has been constructed for the speech production system by combining the concatenated tube model of the vocal tract and Liljencrants-Fant (LF) model of the glottal flow derivative waveform. An adaptive system identification technique, based on extended Kalman filtering, has been used for estimation of the states and model parameters from continuous speech. The glottal signal, represented by the LF model, is piecewise differentiable in one glottal cycle. Hence, the hybrid system has been characterized by separate models during two different modes. Multiple model estimation has been performed by switching between the two models at the mode jumps. The open phase of the glottal cycle has been considered as Mode 1; whereas, the return phase and closed phase combined has been taken as Mode 2. The starting point of Mode 1, also known as glottal opening instant, was estimated by observing formant modulation, which remains negligible during closed phase, and starts to increase at the onset of opening. The starting point of Mode 2, also known as the glottal closing instant, was computed by peak-picking from linear prediction (LP) residual signal. The proposed method estimates the glottal waveform as well as changes in flow occurring at different sections of the vocal tract during speech production. This technique has been found to be accurate and robust to variations in pitch as compared to other LP-based methods in the literature. The method also estimates the air pressure distribution at different sections of the vocal tract.

[1]  A. Oppenheim,et al.  Nonlinear filtering of multiplied and convolved signals , 1968 .

[2]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[3]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[4]  Christophe d'Alessandro,et al.  Zeros of Z-transform representation with application to source-filter separation in speech , 2005, IEEE Signal Processing Letters.

[5]  Haoxuan Li,et al.  Glottal source parametrisation by multi-estimate fusion , 2013 .

[6]  A comparative evaluation of the zeros of z transform representation for voice source estimation , 2007, INTERSPEECH.

[7]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[8]  Paavo Alku,et al.  Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[10]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[11]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[12]  Thierry Dutoit,et al.  A comparative study of glottal source estimation techniques , 2019, Comput. Speech Lang..

[13]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[14]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[15]  D. Simon Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches , 2006 .

[16]  Guanrong Chen,et al.  Kalman Filtering with Real-time Applications , 1987 .

[17]  H. Balakrishnan,et al.  State estimation for hybrid systems: applications to aircraft tracking , 2006 .

[18]  David M. Howard,et al.  Real-Time Dynamic Articulations in the 2-D Waveguide Mesh Vocal Tract Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[21]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[22]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[24]  Abeer Alwan,et al.  Glottal source processing: From analysis to applications , 2014, Comput. Speech Lang..

[25]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[26]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[28]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[29]  H. Joel Trussell,et al.  Seismic deconvolution by multipulse methods , 1990, IEEE Trans. Acoust. Speech Signal Process..

[30]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[31]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[32]  Brad H. Story,et al.  Vocal-tract modeling: fractional elongation of segment lengths in a waveguide model with half-sample delays , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Florian Nadel,et al.  Stochastic Processes And Filtering Theory , 2016 .