A pre-processing method for improvement of vowel onset point detection under noisy conditions

Vowel onset point (VOP) is the instant of time at which vowel region starts in a speech signal. VOP plays a vital role in different applications of speech processing, such as syllable detection, speaker verification, duration modification, language identification etc. There are different existing algorithms for the detection of instance of VOP in a speech signal. The algorithm based on the combined evidences extracted from the source excitation, spectral peaks and modulation spectrum has been used as a baseline system for the present work. The baseline system performs well under clean speech data. However, under noisy conditions the performance of the baseline system degrades. The performance of the system degrades in terms of more number of spurious VOPs, which get detected under noisy speech conditions. According to the available literature, this degraded performance is due to the spectral broadening of the speech in the noisy environments. In this paper we have proposed a pre-processing technique on top of the baseline system to reduce this spectral broadening effect of noise. The noisy speech data are passed through the pre-processing algorithm in order to minimize the spectral broadening effect of speech signal. The pre-processed speech is then passed through the baseline system to detect the VOPs in the speech signal. Experiments were carried out under clean and different noisy speech signals. The results of the experiment show an improvement of 16-21% in terms of removal of spurious VOPs, over the existing baseline system under different noisy speech conditions. Further, the performance of the proposed method has been compared with two different best performing techniques for detection of VOP, and found that the proposed method gives a superior level of performance in terms of identification accuracy and identification rate.

[1]  S. R. Mahadeva Prasanna,et al.  Speaker Verification by Vowel and Nonvowel Like Segmentation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  K. Sreenivasa Rao,et al.  Effect of Noise on Vowel Onset Point Detection , 2011, IC3.

[3]  S. R. Mahadeva Prasanna,et al.  Detection of vowel onset point events using excitation information , 2005, INTERSPEECH.

[4]  K. Sreenivasa Rao,et al.  Vowel Onset Point Detection for Low Bit Rate Coded Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[6]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  K. Sreenivasa Rao,et al.  Vowel onset point detection for noisy speech using spectral energy at formant frequencies , 2013, Int. J. Speech Technol..

[8]  S. R. Mahadeva Prasanna,et al.  Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  A. Vuppala,et al.  Improved vowel onset point detection using epoch intervals , 2012 .

[10]  Joon-Hyuk Chang,et al.  Perceptual weighting filter for robust speech modification , 2006, Signal Process..