Recognition of unaspirated plosives--A statistical approach

In this paper the results of a study of the computer recognition of unaspirated plosives in commonly used polysyllabic words uttered by three different informants are presented. The onglide transitions of the first two formants and their durations have been found to be an effective set of features for the recognition of unaspirated plosives. The rates of transition of these two formants as a feature set have been found to be significantly inferior to the features mentioned earlier. The maximum likelihood method, under the assumption of a normal distribution for the feature set, provides an adequate tool for classification. The assumption of both intergroup and intragroup independence of the features reduces recognition scores. A prior knowledge of target vowels is found necessary for attaining reasonable efficiency. A prior knowledge of voicing manner improves classification efficiency to some extent. The physiological factors responsible for the variation of the recognition score for the various plosives are discussed. For labials and velars the recognition score is very high, nearly 90 percent. An attempt to correlate the dynamics of tongue-body motion with the variations in recognition scores has been made. Back vowels as targets have been found to give improved classification of the preceding consonants. A comparison of the result of machine recognition with those of published results on perception tests has been included. The results are found to be of the same order.

[1]  A. Liberman,et al.  Some Experiments on the Perception of Synthetic Speech Sounds , 1952 .

[2]  R. Cole,et al.  Toward a theory of speech perception. , 1974, Psychological review.

[3]  J. Forgie,et al.  Results Obtained from a Vowel Recognition Computer Program , 1959 .

[4]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[5]  Sankar K. Pal,et al.  Fuzzy sets and decisionmaking approaches in vowel and speaker recognition , 1977 .

[6]  H Winitz,et al.  Vocalic transitions in the perception of voiceless initial stops. , 1975, The Journal of the Acoustical Society of America.

[7]  Kenneth N. Stevens,et al.  Quantal Aspects of Consonant Production and Perception: A Study of Retroflex Stop Consonants. , 1975 .

[8]  Sankar K. Pal,et al.  Correction to "On Automatic Plosive Identification Using Fuzziness in Property Sets" , 1978, IEEE Trans. Syst. Man Cybern..

[9]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[10]  A. Liberman Some Results of Research on Speech Perception , 1957 .

[11]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[12]  David J. Broad Formants in automatic speech recognition , 1972 .

[13]  Jean‐Pierre A. Radley,et al.  Acoustic Properties of Stop Consonants , 1957 .

[14]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[15]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[16]  L. Rabiner,et al.  System for automatic formant analysis of voiced speech. , 1970, The Journal of the Acoustical Society of America.

[17]  S. Öhman Coarticulation in VCV Utterances: Spectrographic Measurements , 1966 .

[18]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[19]  Lawrence R. Rabiner,et al.  On creating reference templates for speaker independent recognition of isolated words , 1978 .

[20]  Shuji Doshita,et al.  The Automatic Speech Recognition System for Conversational Sound , 1963, IEEE Trans. Electron. Comput..

[21]  A. Liberman,et al.  The role of selected stimulus-variables in the perception of the unvoiced stop consonants. , 1952, The American journal of psychology.

[22]  A. Liberman,et al.  The role of consonant-vowel transitions in the perception of the stop and nasal consonants. , 1954 .