Using group delay functions from all-pole models for speaker recognition

Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals. Index Terms: speaker verification, group delay functions, high vocal effort

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Alvin F. Martin,et al.  Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation , 2011, INTERSPEECH.

[3]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[4]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[5]  Sridha Sridharan,et al.  The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Bayya Yegnanarayana,et al.  Speech processing using group delay functions , 1991, Signal Process..

[8]  Thierry Dutoit,et al.  Chirp group delay analysis of speech signals , 2007, Speech Commun..

[9]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Paavo Alku,et al.  Weighted linear prediction for speech analysis in noisy conditions , 2009, INTERSPEECH.

[11]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[12]  Yadong Wang,et al.  Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database , 2003, INTERSPEECH.

[13]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[14]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[17]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[18]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Satoshi Nakamura,et al.  Efficient representation of short-time phase based on group delay , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[21]  The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .

[22]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[23]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[24]  Paavo Alku,et al.  Comparing spectrum estimators in speaker verification under additive noise degradation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  E. Ambikairajah,et al.  Group delay features for speaker recognition , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.