Vowel-onset detection.

An algorithm is presented that correctly detects the large majority of vowel onsets in fluent speech. The algorithm is based on the simple assumption that vowel onsets are characterized by the appearance of rapidly increasing resonance peaks in the amplitude spectrum. Application to carefully articulated, isolated words results in a high number of false alarms, predominantly before consonants that can function as vowels in a different context such as another language or as a syllabic consonant. After applying some modifications in the setting of some parameters, this number of false alarms for isolated words can be reduced significantly, without the risk of a large number of missed detections. The temporal accuracy of the algorithm is better than 20 ms. This accuracy is determined with respect to the perceptual moment of occurrence of a vowel onset as determined by a phonetician.

[1]  A. Liberman,et al.  Some Experiments on the Perception of Synthetic Speech Sounds , 1952 .

[2]  J. 't Hart,et al.  Gating Techniques as an Aid in Speech Analysis , 1964 .

[3]  Aage R. Møller,et al.  Basic Mechanisms in Hearing , 1973 .

[4]  J. Hart,et al.  Intonation by rule: a perceptual quest , 1973 .

[5]  C. Weinstein,et al.  A system for acoustic-phonetic analysis of continuous speech , 1975 .

[6]  L. A. Chistovich,et al.  Auditory Segmentation of Acoustic Flow and its Possible Role in Speech Processing , 1975 .

[7]  J. T. Hart,et al.  Integrating different levels of intonation analysis , 1975 .

[8]  S. Erulkar Physiological Studies of the Inferior Colliculus and Medial Geniculate Complex , 1975 .

[9]  Moshe Abeles,et al.  Single Unit Activity of the Auditory Cortex , 1975 .

[10]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[11]  M. Studdert-Kennedy,et al.  Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues , 1977 .

[12]  Timothy Diller,et al.  An automatic word spotting system for conversational speech , 1978, ICASSP.

[13]  E Paulus,et al.  Automatic speech recognition using psychoacoustic models. , 1979, The Journal of the Acoustical Society of America.

[14]  W L Cullinan,et al.  The perception of temporally segmented vowels and consonant-vowel syllables. , 1979, Journal of speech and hearing research.

[15]  R. Koch,et al.  Time Segmentation in Central Analysis of Complex Signals , 1979 .

[16]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[17]  D Kewley-Port,et al.  Time-varying features as correlates of place of articulation in stop consonants. , 1983, The Journal of the Acoustical Society of America.

[18]  C D Geisler,et al.  Responses of auditory-nerve fibers to consonant-vowel syllables. , 1981, The Journal of the Acoustical Society of America.

[19]  D B Pisoni,et al.  Perception of static and dynamic acoustic cues to place of articulation in initial stop consonants. , 1983, The Journal of the Acoustical Society of America.

[20]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[21]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[22]  Paul A. Luce,et al.  Time-varying features of initial stop consonants in auditory running spectra: A first report , 1984, Perception & psychophysics.

[23]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[24]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[25]  Vowels, diphthongs, and vowel clusters. A quantitative dynamic approach through synthesis-by-rule , 1989 .