Spectral transition measure for detection of obstruents

Obstruents are very important acoustical events (i.e., abrupt-consonantal landmarks) in the speech signal. This paper presents the use of novel Spectral Transition Measure (STM) to locate the obstruents in the continuous speech signal. The problem of obstruent detection involves detection of phonetic boundaries associated with obstruent sounds. In this paper, we propose use of STM information derived from state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) feature set and newly developed feature set, viz., MFCC-TMP (which uses Teager Energy Operator (TEO) to exploit implicitly Magnitude and Phase information in the MFCC framework) for obstruent detection. The key idea here is to exploit capabilities of STM to capture high dynamic transitional characteristics associated with obstruent sounds. The experimental setup is developed on entire TIMIT database. For 20 ms agreement (tolerance) duration, obstruent detection rate is found to be 97.59 % with 17.65 % false acceptance using state-of-the-art MFCC-STM and 96.42 % with 12.88 % false acceptance using MFCC-TMP-STM. Finally, STM-based features along with static representation (i.e., MFCC-STM and MFCC-TMP-STM) are evaluated for phone recognition task.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[5]  S. Furui,et al.  Speaker-independent isolated word recognition based on emphasized spectral dynamics , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  Hemant A. Patil,et al.  Significance of magnitude and phase information via VTEO for humming based biometrics , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[8]  P. Ladefoged A course in phonetics , 1975 .

[9]  P Howell,et al.  Production and perception of rise time in the voiceless affricate/fricative distinction. , 1983, The Journal of the Acoustical Society of America.

[10]  Henning Reetz,et al.  Acoustic cues discriminating german obstruents in place and manner of articulation. , 2007, The Journal of the Acoustical Society of America.

[11]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Lawrence R. Rabiner,et al.  On the Relation between Maximum Spectra Boundaries , 2006 .

[13]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[15]  Louis J. Gerstman Noise Duration as a Cue for Distinguishing among Fricative, Affricate, and Stop Consonants , 1956 .

[16]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[17]  Kishore Prahallad,et al.  A comparative study of constrained and unconstrained approaches for segmentation of speech signal , 2010, INTERSPEECH.

[18]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[19]  Hemant A. Patil,et al.  A spectral transition measure based MELCEPSTRAL features for obstruent detection , 2014, 2014 International Conference on Asian Language Processing (IALP).

[20]  ACOUSTIC ANALYSIS OF THE PERSIAN FRICATIVE-AFFRICATE CONTRAST , 2007 .