Robust formant detection using group delay function and stabilized weighted linear prediction

In this paper, we propose a robust spectral representation for detecting formants in heavily degraded conditions. The method combines the temporal robustness of the stabilized weighted linear prediction (SWLP) with the robustness of group delay (GD) function in the frequency domain. Weighting of the cost function in linear prediction analysis with the short-time energy of the speech signal improves the robustness of the resultant spectrum. It also improves the accuracy of the estimated resonances as the weighting function gives more weightage to the closed phase of the glottal cycle, which is also the high SNR region of the signal. The group delay spectrum computed as the sum of individual resonances denoted by the roots of the SWLP coefficients, improves the robustness of weaker higher order resonances. The proposed SWLP-GD spectrum performs better than the conventional LP spectrum and the STRAIGHT spectrum in terms of spectral distortion measure and formant detection accuracies.

[1]  Li Deng,et al.  Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[3]  T. Backstrom,et al.  Objective and Subjective Evaluation of Seven Selected All-Pole Modelling Methods in Processing of Noise Corrupted Speech , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[4]  Nuggehally Sampath Jayant,et al.  LPC analysis/Synthesis from speech inputs containing quantizing noise or additive white noise , 1976 .

[5]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Thippur V. Sreenivas,et al.  A Mixture Model Approach for Formant Tracking and the Robustness of Student's-t Distribution , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[8]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[9]  Bhaskar D. Rao,et al.  All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[10]  Luis A. Hernández Gómez,et al.  Initialization, Training, and Context-Dependency in HMM-Based Formant Tracking , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Li Deng,et al.  Tracking Vocal Tract Resonances Using a Quantized Nonlinear Function Embedded in a Temporal Constraint , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[13]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[15]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[16]  Tetsuya Shimamura,et al.  Linear predictive analysis of noisy speech , 1997, 1997 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM. 10 Years Networking the Pacific Rim, 1987-1997.

[17]  Martin Heckmann,et al.  Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  T. Shimamura Pitch synchronous addition and extension for linear predictive analysis of noisy speech , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[19]  Mübeccel Demirekler,et al.  Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Steve McLaughlin,et al.  Cascade Prediction Filters With Adaptive Zeros to Track the Time-Varying Resonances of the Vocal Tract , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[22]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[23]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[24]  Li Deng,et al.  A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .