Parametrisation of the speech space using the self-organising neural network

Speech recognition is a difficult problem due to the inability of current systems to cope with connected speech. Neural networks are able to learn some aspects of this task. An unsupervised learning scheme like the self-organising map can be used to both classify and order the speech sounds and provide a front end to higher level processing. A map of phonemes (phonotopic map) is used to trace trajectories of sounds from utterances. The self-organising map provides a means of reducing the inherent dimensionality of the speech data. A crinkle factor which is used to determine how close the dimensionality of the map is to the dimensionality of the speech input shows that speech has an inherent dimensionality of at least three or four. A projection of the map and the speech data shows how the self-organising map fits the speech space.

[1]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Francisco Casacuberta,et al.  On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words , 1988, IEEE Trans. Acoust. Speech Signal Process..

[3]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[4]  William Hamilton Paxton,et al.  A framework for speech understanding , 1977 .

[5]  Pietro Laface,et al.  Lexical access to large vocabularies for speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Dennis H. Klatt,et al.  Review of the ARPA speech understanding project , 1990 .

[7]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Second Edition , 1988, Springer Series in Information Sciences.

[8]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[9]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.