Significance of the LP-MVDR spectral ratio method in Whisper Detection

A new spectral ratio method is proposed in this paper for detecting whispered segments within a normally phonated speech stream. The method is based on computing the ratio of the linear Prediction(LP) spectrum to the minimum variance distortion less response (MVDR) spectrum. Both the linear prediction method and the LP residual method by themselves are found to be inadequate in modelling medium to high frequencies in the speech signal. On the contrary, the MVDR method shows robustness in modelling spectra of all frequencies. This difference in spectral estimation between the two is utilized in the proposed spectral ratio method to separate whispered segments having less harmonics and more noise from normally phonated segments of speech. A comparative analysis of the proposed method with other methods like the LP residual and the spectral flatness methods is described. Whisper Detection experiments are conducted on the CHAINS database. The proposed method indicates reasonable improvements as noted from the ROC curves and the whisper diarization error rate.

[1]  P. J. Sherman,et al.  On the family of ML spectral estimates for mixed spectrum identification , 1991, IEEE Trans. Signal Process..

[2]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[4]  Michael A. Carlin,et al.  Unsupervised detection of whispered speech in the presence of normal phonation , 2006, INTERSPEECH.

[5]  Bhaskar D. Rao,et al.  Minimum variance distortionless response (MVDR) modeling of voiced speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[7]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[8]  A. Gray,et al.  A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis , 1974 .

[9]  John H. L. Hansen,et al.  Advancements in whisper-island detection using the linear predictive residual , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Chi Zhang,et al.  Effective Segmentation based on Vocal Effort Change Point Detection 1 , 2008 .

[11]  Kazuya Takeda,et al.  Experiments on recognition of lavalier microphone speech and whispered speech in real world environments , 2002, Interspeech.

[12]  John H. L. Hansen,et al.  Advancements in whisper-island detection within normally phonated audio streams , 2009, INTERSPEECH.

[13]  J. Burg THE RELATIONSHIP BETWEEN MAXIMUM ENTROPY SPECTRA AND MAXIMUM LIKELIHOOD SPECTRA , 1972 .