A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances

We propose a model for recognition of utterances of consonant-vowel (CV) units. The acoustic-phonetic knowledge of the CV classes is incorporated in the form of constraints of a constraint satisfaction model. The model combines evidence from multiple classifiers. The significant feature of this model is that discrimination of the CV units could be enhanced by a combination of even weak evidence derived from the features. The evidence is obtained from multilayer feedforward neural networks trained for subgroups of CV classes. The evidence is enhanced using a set of feedback subnetworks in the constraint satisfaction model. The weights for the connections in the feedback subnetworks are derived using acoustic-phonetic knowledge and the performance statistics of the trained networks. The performance of the proposed model is demonstrated for recognition of utterances of a large number (80) of stop consonant-vowel units for the Indian language Hindi.

[1]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[2]  B. Yegnanarayana,et al.  Neural network models for spotting stop consonant-vowel (SCV) segments in continuous speech , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[3]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[4]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[6]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[7]  R. Prakash Dixit,et al.  Glottal gestures in Hindi plosives , 1989 .

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[10]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[11]  B. Yegnanarayana,et al.  Neural Networks based Approach for Detection of Vowel Onset Points , 1999 .

[12]  F. Cooper Acoustics in human communication: evolving ideas about the nature of speech. , 1980, The Journal of the Acoustical Society of America.

[13]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[14]  Bayya Yegnanarayana,et al.  Supervised texture classification using a probabilistic neural network and constraint satisfaction model , 1998, IEEE Trans. Neural Networks.

[15]  B. Yegnanarayana,et al.  Classification of CV transitions in continuous speech using neural network models , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[16]  Bayya Yegnanarayana,et al.  Segmentation of Gabor-filtered textures using deterministic relaxation , 1996, IEEE Trans. Image Process..