Analysis of spurious vowel-like regions (VLRs) detected by excitation source information

This work treats vowels and semivowels as vowellike regions. An analysis of the spurious vowel-like regions (VLRs) detected by a signal processing based method using excitation source information is demonstrated. Limitation of excitation information in detecting some of the nasals and voiced consonants as non-VLRs is discussed. An attempt to reduce spurious VLRs compared to the existing signal processing based method for VLRs detection [1] is made. A multi-class statistical phone classifier that classifies speech into broad vowel, consonant and silence categories is trained. The outputs of the classifier are suitably combined to get evidence for vowel-like regions, different broad categories of consonants and silence regions. The output from the existing signal processing method is compared with different evidences from the statistical method. The spurious ones are eliminated by using the evidences from the statistical method. The experimental studies conducted on TIMIT and inhouse databases demonstrate significant reduction in the spurious VLRs with a little loss in the VLRs detection performance. A net gain of 4.21% and 7.71% in frame error rate is achieved for TIMIT and in-house databases, respectively.

[1]  S. R. Mahadeva Prasanna,et al.  Detection of vowel onset point events using excitation information , 2005, INTERSPEECH.

[2]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[3]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  K. Sreenivasa Rao,et al.  Vowel Onset Point Detection for Low Bit Rate Coded Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  S Shahnawazuddin,et al.  Assamese spoken query system to access the price of agricultural commodities , 2013, 2013 National Conference on Communications (NCC).

[8]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  S. R. Mahadeva Prasanna,et al.  Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  S. R. Mahadeva Prasanna,et al.  Speaker verification under degraded condition: a perceptual study , 2011, Int. J. Speech Technol..

[11]  Leon Cohen,et al.  Time Frequency Analysis: Theory and Applications , 1994 .

[12]  S. R. Mahadeva Prasanna,et al.  Speaker Verification by Vowel and Nonvowel Like Segmentation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.