Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech

The aim of the current study is to propose acoustic features for detection of nasals in continuous speech. Acoustic features that represent certain characteristics of speech production are extracted. Features representing excitation source characteristcs are extracted using zero frequency filtering method. Features representing vocal tract system characteristics are extracted using zero time windowing method. Feature sets are formed by combining certain subsets of the features mentioned above. These feature sets are evaluated for their representativeness of nasals in continuous speech in three different languages, namely, English, Hindi and Telugu. Results show that nasal detection is reliable and consistent across all the languages mentioned above.

[1]  Carol Y. Espy-Wilson,et al.  Acoustic parameters for automatic detection of nasal manner , 2004, Speech Commun..

[2]  Paul Mermelstein,et al.  On detecting nasals in continuous speech , 1975 .

[3]  Marilyn Y. Chen Nasal detection module for a knowledge-based speech recognition system , 2000, INTERSPEECH.

[4]  Milos Cernak,et al.  Nasal Speech Sounds Detection Using Connectionist Temporal Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Bayya Yegnanarayana,et al.  Locating Burst Onsets Using SFF Envelope and Phase Information , 2017, INTERSPEECH.

[6]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[7]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Kanae Amino,et al.  Nasality in speech and its contribution to speaker individuality , 2014, INTERSPEECH.

[9]  V. K. Mittal,et al.  Study of the effects of vocal tract constriction on glottal vibration. , 2014, The Journal of the Acoustical Society of America.

[10]  Bayya Yegnanarayana,et al.  Robust features for sonorant segmentation in continuous speech , 2015, INTERSPEECH.

[11]  K. Stevens,et al.  Analog studies of the nasalization of vowels. , 1956, The Journal of speech and hearing disorders.

[12]  Bayya Yegnanarayana,et al.  Acoustic segmentation of speech using zero time liftering (ZTL) , 2013, INTERSPEECH.

[13]  Bayya Yegnanarayana,et al.  Extracting formants from short segments of speech using group delay functions , 2006, INTERSPEECH.

[14]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Bayya Yegnanarayana,et al.  Features for automatic detection of voice bars in continuous speech , 2008, INTERSPEECH.

[16]  Chin-Hui Lee,et al.  Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  C. Weinstein,et al.  A system for acoustic-phonetic analysis of continuous speech , 1975 .

[18]  Bayya Yegnanarayana,et al.  Spectro-temporal analysis of speech signals using zero-time windowing and group delay function , 2013, Speech Commun..

[19]  Bayya Yegnanarayana,et al.  Robust Vowel Landmark Detection Using Epoch-Based Features , 2016, INTERSPEECH.

[20]  O. Fujimura Analysis of Nasal Consonants , 1962 .