VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using the phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods. The evaluation results show that the proposed method performs better than the existing methods.

[1]  S. R. M. Prasanna,et al.  Analysis of spurious vowel-like regions (VLRs) detected by excitation source information , 2013, 2013 Annual IEEE India Conference (INDICON).

[2]  Bayya Yegnanarayana,et al.  Spotting Multilingual Consonant-Vowel Units of Speech Using Neural Network Models , 2005, NOLISP.

[3]  K. Sreenivasa Rao,et al.  Vowel Onset Point Detection for Low Bit Rate Coded Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Bayya Yegnanarayana,et al.  Detection of vowel on set points in continuous speech using autoassociative neural network models , 2004, INTERSPEECH.

[5]  Lawrence R. Rabiner,et al.  On the Relation between Maximum Spectra Boundaries , 2006 .

[6]  Chung-Hsien Wu,et al.  A hierarchical neural network model based on a C/V segmentation algorithm for isolated Mandarin speech recognition , 1991, IEEE Trans. Signal Process..

[7]  K. Sreenivasa Rao,et al.  Phonetic and Prosodically Rich Transcribed speech corpus in Indian languages: Bengali and Odia , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[8]  S. R. Mahadeva Prasanna,et al.  Detection of vowel onset point events using excitation information , 2005, INTERSPEECH.

[9]  B. Yegnanarayana,et al.  Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[10]  Avinash Kumar,et al.  Exploring different acoustic modeling techniques for the detection of vowels in speech signal , 2016, 2016 Twenty Second National Conference on Communication (NCC).

[11]  Adrian Leemann,et al.  The recognition of read and spontaneous speech in local vernacular: The case of Zurich German , 2015, J. Phonetics.

[12]  Hemant A. Patil,et al.  Spectral transition measure for detection of obstruents , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[13]  S. R. Mahadeva Prasanna,et al.  Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  K. Sreenivasa Rao,et al.  Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System , 2017, IEEE Signal Processing Letters.

[15]  S. R. M. Prasanna,et al.  Exploration of Deep Belief Networks for Vowel-like regions detection , 2014, 2014 Annual IEEE India Conference (INDICON).

[16]  Bayya Yegnanarayana,et al.  Duration modification using glottal closure instants and vowel onset points , 2009, Speech Commun..

[17]  E. Nöth,et al.  Can You Tell Apart Spontaneous and Read Speech if You just Look at Prosody , 1995 .

[18]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..

[19]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  B. Yegnanarayana,et al.  Detection of Vowel Onset Points in Continuous Speech using Autoassociative Neural Network Models , 2004 .

[21]  S. Furui Recent Advances in Spontaneous Speech Recognition and Understanding , 2003 .

[22]  S. R. Mahadeva Prasanna,et al.  SIGNIFICANCE OF VOWEL ONSET POINT FOR SPEECH ANALYSIS , 2001 .

[23]  Avinash Kumar,et al.  Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence , 2017, Circuits Syst. Signal Process..

[24]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[25]  L. Auger The Journal of the Acoustical Society of America , 1949 .

[26]  S. R. Mahadeva Prasanna,et al.  Consonant-vowel unit recognition using dominant aperiodic and transition region detection , 2017, Speech Commun..

[27]  D. J. Hermes,et al.  Vowel-onset detection. , 1990, The Journal of the Acoustical Society of America.