On the relevance of some spectral and temporal patterns for vowel classification

Many previous studies suggested that the information necessary for the identification of vowels from continuous speech is distributed both within and outside vowel boundaries. This information appears to be embedded in the speech signal in the form of various acoustic cues or patterns: spectral, energy, static, dynamic, and temporal. In a recent paper we identified seven types of acoustic patterns that might be exploited by listeners in the identification of coarticulated vowels. The current paper extends the previous study and quantizes the relevance for vowel classification of eight types of acoustic patterns, including static spectral patterns, dynamical spectral patterns, and temporal-durational patterns. Four of these eight patterns are not directly exploited by current automatic speech recognition techniques in computing the likelihood of each phonetic model. These four new patterns proved to contain significant vowel information. Two of these four new patterns represent static spectral patterns lying outside of the currently accepted boundaries of vowels, whereas one is a double-slope dynamical pattern and another one is a simple durational pattern. The findings of this paper may be important for both automatic speech recognition models and models of vowel/phoneme perception by humans.

[1]  R L Diehl,et al.  Identifying vowels in CVC syllables: effects of inserting silence and noise. , 1981, Perception & psychophysics.

[2]  Sorin Dusan,et al.  Non‐monotonic spectral transitions between successive phonemes , 2004 .

[3]  Sarel van Vuuren,et al.  Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..

[4]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[5]  Daniel P. W. Ellis,et al.  Using mutual information to design class-specific phone recognizers , 2003, INTERSPEECH.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Louis C. W. Pols,et al.  Perisegmental speech improves consonant and vowel identification , 1999, Speech Communication.

[8]  Jeff A. Bilmes,et al.  Statistical acoustic indications of coarticulation , 1999 .

[9]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[10]  Terrance M. Nearey,et al.  Modeling the role of inherent spectral change in vowel identification , 1986 .

[11]  M. Studdert-Kennedy,et al.  On the role of formant transitions in vowel recognition. , 1967, The Journal of the Acoustical Society of America.

[12]  W. Strange,et al.  Dynamic specification of coarticulated vowels spoken in sentence context. , 1989, The Journal of the Acoustical Society of America.

[13]  Sorin Dusan,et al.  Effects of phonetic contexts on the duration of phonetic segments in fluent read speech , 2004, INTERSPEECH.

[14]  G. E. Peterson,et al.  Transitions, Glides, and Diphthongs , 1961 .

[15]  Li Deng,et al.  Analysis of acoustic-phonetic variations in fluent speech using TIMIT , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[17]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[18]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  D. Shankweiler,et al.  Consonant environment specifies vowel identity. , 1976, The Journal of the Acoustical Society of America.

[20]  Donald H. Foley Considerations of sample and feature size , 1972, IEEE Trans. Inf. Theory.

[21]  Jean-Luc Schwartz,et al.  An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram , 1993, Comput. Speech Lang..

[22]  Sorin Dusan On the nature of acoustic information in identification of coarticulated vowels , 2005, INTERSPEECH.

[23]  Sorin Dusan ON THE DISTRIBUTION OF INFORMATION AND INTRINSIC VARIABILITY FOR CLASSIFICATION OF COARTICULATED VOWELS , 2006 .

[24]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .