Robust spectral representation using group delay function and stabilized weighted linear prediction for additive noise degradations

In this paper, we propose a robust spectral representation using the group delay (GD) function computed from the stabilized weighted linear prediction (SWLP) coefficients. Temporal weighting of the cost function in linear prediction (LP) analysis with the short-term energy of the speech signal improves the robustness of the resultant spectrum. The additive property of the group delay function provides for better representation of weaker resonances in the spectrum, and thereby improving the robustness of the representation. The SWLP provides robustness in the temporal domain, whereas the GD function provides robustness in the frequency domain. The proposed SWLP-GD representation is shown to be robust against different types of additive noise degradations, compared to the popularly used discrete Fourier transform (DFT) or LP based representations. In a small-scale closed-set speaker recognition experiment, the cepstral features derived from the proposed SWLP-GD spectrum perform better than the traditional mel-cepstral features computed from the discrete Fourier transform (DFT) spectrum under conditions of mismatched degradations.

[1]  Paavo Alku,et al.  Robust formant detection using group delay function and stabilized weighted linear prediction , 2013, INTERSPEECH.

[2]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[3]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[5]  Paavo Alku,et al.  Regularized All-Pole Models for Speaker Verification Under Noisy Environments , 2012, IEEE Signal Processing Letters.

[6]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Paavo Alku,et al.  Comparing spectrum estimators in speaker verification under additive noise degradation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  W. Bastiaan Kleijn,et al.  Regularized Linear Prediction of Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[10]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  Paavo Alku,et al.  Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR , 2011, INTERSPEECH.

[13]  H. Strube Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.

[14]  T. Backstrom,et al.  Objective and Subjective Evaluation of Seven Selected All-Pole Modelling Methods in Processing of Noise Corrupted Speech , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[15]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[16]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[17]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[18]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[19]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[20]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[21]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[22]  Paavo Alku,et al.  Weighted linear prediction for speech analysis in noisy conditions , 2009, INTERSPEECH.

[23]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[24]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[25]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  P. Alku,et al.  Formant frequency estimation of high-pitched vowels using weighted linear prediction. , 2013, The Journal of the Acoustical Society of America.