Time-domain, digital segmentation of connected natural speech

The digital segmentation algorithm described in this paper subdivides speech signals into discrete sections which permit to localize most of the spoken phonemes in natural speech. Two pre-segmentation steps separate pauses and voiceless parts from the (voiced) rest of the signal. The subsequent main segmentation step tries to describe the speed of articulation in the vocal tract according to some global speech parameters. Since, during an utterance, the vocal tract does not move at constant speed, but attempts to realize the articulatory target position associated with each phoneme, sections with relatively low changes of vocal tract position ("stationary" segments) and sections with greater changes ("dynamic" segments) can be separated. The dynamic segments can be further characterized when the direction of change in the course of the parameters is regarded.

[1]  D. Raj Reddy,et al.  A Procedure for the Segmentation of Connected Speech , 1968 .

[2]  W. Lea,et al.  An Approach to syntactic recognition without phonemics , 1973 .

[3]  H. Belar,et al.  Speech processing techniques and applications , 1967, IEEE Transactions on Audio and Electroacoustics.

[4]  R. Plomp,et al.  Perceptual and physical space of vowel sounds. , 1969, The Journal of the Acoustical Society of America.

[5]  P. Delattre,et al.  From Acoustic Cues to Distinctive Features , 1968 .

[6]  D R Reddy Phoneme grouping for speech recognition. , 1967, The Journal of the Acoustical Society of America.

[7]  D. Reddy Segmentation of Speech Sounds , 1966 .

[8]  W. Hess,et al.  A pitch-synchronous digital feature extraction system for phonemic recognition of speech , 1976 .