Advances in Non-Linear Modeling for Speech Processing

Advances in Non-Linear Modeling for Speech Processing includes advanced topics in non-linear estimation and modeling techniques along with their applications to speaker recognition. Non-linear aeroacoustic modeling approach is used to estimate the important fine-structure speech events, which are not revealed by the short time Fourier transform (STFT). This aeroacostic modeling approach provides the impetus for the high resolution Teager energy operator (TEO). This operator is characterized by a time resolution that can track rapid signal energy changes within a glottal cycle. The cepstral features like linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are computed from the magnitude spectrum of the speech frame and the phase spectra is neglected. To overcome the problem of neglecting the phase spectra, the speech production system can be represented as an amplitude modulation-frequency modulation (AM-FM) model. To demodulate the speech signal, to estimation the amplitude envelope and instantaneous frequency components, the energy separation algorithm (ESA) and the Hilbert transform demodulation (HTD) algorithm are discussed. Different features derived using above non-linear modeling techniques are used to develop a speaker identification system. Finally, it is shown that, the fusion of speech production and speech perception mechanisms can lead to a robust feature set.

[1]  G. Arce,et al.  State description for the root-signal set of median filters , 1982 .

[2]  Christine H. Shadle,et al.  Fluid flow in a dynamic mechanical model of the vocal folds and tract. II. Implications for speech production studies , 1999 .

[3]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[4]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Richard S. McGowan,et al.  AN AEROACOUSTICS APPROACH TO PHONATION : OBSERVATIONS , 2009 .

[6]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[7]  T. J. Thomas A finite element model of fluid flow in the vocal tract , 1986 .

[8]  T. Koizumi,et al.  Two-mass models of the vocal cords for natural sounding voice synthesis. , 1987, The Journal of the Acoustical Society of America.

[9]  Amir Hussain Multi-Sensor Neural-Network Processing of Noisy Speech , 1999, Int. J. Neural Syst..

[10]  D. Berry,et al.  Analysis of vocal disorders with methods from nonlinear dynamics. , 1994, Journal of speech and hearing research.

[11]  Jean Schoentgen,et al.  Glottal waveform synthesis with Volterra shaping functions , 1992, Speech Commun..

[12]  A. Hussain,et al.  Nonlinear speech processing: Overview and applications , 2002 .

[13]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[14]  Jack J. Jiang,et al.  Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. , 2005, Journal of voice : official journal of the Voice Foundation.

[15]  Harvey F. Silverman,et al.  Time-varying feature selection and classification of unvoiced stop consonants , 1994, IEEE Trans. Speech Audio Process..

[16]  Miguel Angel Ferrer-Ballester,et al.  Automatic Detection of Pathologies in The Voice by HOS Based Parameters , 2001, EURASIP J. Adv. Signal Process..

[17]  Jean Schoentgen,et al.  Non-linear signal representation and its application to the modelling of the glottal waveform , 1990, Speech Commun..

[18]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[19]  Petros Maragos,et al.  Nonlinear methods for speech analysis and synthesis , 2008 .

[20]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[21]  Amir Hussain,et al.  Binaural sub-band adaptive speech enhancement using artificial neural networks , 1998, Speech Commun..

[22]  George S. Moschytz,et al.  Neural network filters for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[23]  H. Herzel,et al.  Bifurcations in an asymmetric vocal-fold model. , 1995, The Journal of the Acoustical Society of America.