Acoustic-phonetic features for the automatic classification of stop consonants

In this paper, the acoustic-phonetic characteristics of the American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic-phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic-phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place of articulation detection and 86% for the overall classification of stops.

[1]  Jan Van der Spiegel,et al.  Auditory-based speech processing based on the average localized synchrony detection , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[3]  P. Mermelstein,et al.  Speech sounds and features , 1975, Proceedings of the IEEE.

[4]  B H Repp,et al.  Perception of intervocalic stop consonants: the contributions of closure duration and formant transitions. , 1983, The Journal of the Acoustical Society of America.

[5]  Richard M. Stern,et al.  Multiple Approaches to Robust Speech Recognition , 1992, HLT.

[6]  Ara Samouelian,et al.  Frame-level phoneme classification using inductive inference , 1997, Comput. Speech Lang..

[7]  T D Carrell,et al.  Onset spectra and formant transitions in the adult's and child's perception of place of articulation in stop consonants. , 1983, The Journal of the Acoustical Society of America.

[8]  Anastasios Delopoulos,et al.  Recognition of unvoiced stops from their time-frequency representation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  M. Studdert-Kennedy,et al.  Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues , 1977 .

[10]  W. G. Radley Visible Speech , 1948, Nature.

[11]  Jan Van der Spiegel,et al.  An acoustic-phonetic feature-based system for the automatic recognition of fricative consonants , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  S. Blumstein,et al.  Perceptual invariance and onset spectra for stop consonants in different vowel environments , 1976 .

[13]  H. Sussman,et al.  An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .

[14]  C. L. Searle,et al.  Stop consonant discrimination based on human audition. , 1979, The Journal of the Acoustical Society of America.

[15]  Victor W. Zue,et al.  Acoustic Characteristics of Stop Consonants: A Controlled Study , 1976 .

[16]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[17]  M. Dorman,et al.  Relative spectral change and formant transitions as cues to labial and alveolar place of articulation. , 1996, The Journal of the Acoustical Society of America.

[18]  Victor Zue,et al.  Selecting acoustic features for stop consonant identification , 1983, ICASSP.

[19]  Harvey F. Silverman,et al.  Time-varying feature selection and classification of unvoiced stop consonants , 1994, IEEE Trans. Speech Audio Process..

[20]  K. Stevens,et al.  Role of formant transitions in the voiced-voiceless distinction for stops. , 1974, The Journal of the Acoustical Society of America.

[21]  Jan Van der Spiegel,et al.  AUTOMATIC DETECTION AND CLASSIFICATION OF STOP CONSONANTS USING AN ACOUSTIC-PHONETIC FEATURE-BASED SYSTEM , 1999 .

[22]  R De Mori,et al.  Speaker-independent consonant classification in continuous speech with distinctive features and neural networks. , 1993, The Journal of the Acoustical Society of America.

[23]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[24]  T. R. Anderson,et al.  Speaker independent phoneme recognition with an auditory model and a neural network: a comparison with traditional techniques , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[25]  T J Edwards Multiple features analysis of intervocalic English plosives. , 1981, The Journal of the Acoustical Society of America.

[26]  S. Blumstein,et al.  A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: evidence from a cross-language study. , 1981, The Journal of the Acoustical Society of America.

[27]  A Bonneau,et al.  Perception of the place of articulation of French stop bursts. , 1996, The Journal of the Acoustical Society of America.

[28]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[29]  Oded Ghitza,et al.  A comparative study of mel cepstra and EIH for phone classification under adverse conditions , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[30]  S. Blumstein,et al.  Acoustic invariance in speech production: evidence from measurements of the spectral characteristics of stop consonants. , 1979, The Journal of the Acoustical Society of America.

[31]  K. Stevens,et al.  Effect of burst amplitude on the perception of stop consonant place of articulation. , 1983, The Journal of the Acoustical Society of America.

[32]  Stephanie Seneff,et al.  Pitch and spectral analysis of speech based on an auditory synchrony model , 1985 .

[33]  G D Allen,et al.  Cues for intervocalic /t/ and /d/ in children and adults. , 1988, The Journal of the Acoustical Society of America.

[34]  Jan Van der Spiegel,et al.  Robust auditory-based speech processing using the average localized synchrony detection , 2002, IEEE Trans. Speech Audio Process..

[35]  N. Umeda Consonant duration in American English , 1977 .

[36]  S. Blumstein,et al.  Perceptual invariance and onset spectra for stop consonants in different vowel environments. , 1980, The Journal of the Acoustical Society of America.

[37]  S. Blumstein,et al.  The role of the gross spectral shape as a perceptual cue to place articulation in initial stop consonants. , 1982, The Journal of the Acoustical Society of America.

[38]  B. Delgutte Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[39]  D B Pisoni,et al.  Perception of static and dynamic acoustic cues to place of articulation in initial stop consonants. , 1983, The Journal of the Acoustical Society of America.

[40]  Thomas H. Crystal,et al.  The duration of American-English stop consonants: an overview , 1988 .

[41]  D Kewley-Port,et al.  Time-varying features as correlates of place of articulation in stop consonants. , 1983, The Journal of the Acoustical Society of America.

[42]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[43]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[44]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[45]  Stephanie Sene A joint synchrony/mean-rate model of au-ditory speech processing , 1988 .

[46]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[47]  A M Ali,et al.  Acoustic-phonetic features for the automatic classification of fricatives. , 2001, The Journal of the Acoustical Society of America.

[48]  Jan Van der Spiegel,et al.  Auditory-based acoustic-phonetic signal processing for robust continuous speech recognition , 1999 .

[49]  Ronald A. Cole,et al.  The phantom in the phoneme: Invariant cues for stop consonants , 1974 .