Context-modulated vowel discrimination using connectionist networks☆

Abstract A method for constructing isomorphic context-specific connectionist networks for phoneme recognition is introduced. It is shown that such networks can be merged into a single context-modulated network that makes use of second-order unit interconnections. This is accomplished by computing a minimal basis for the set of context-specific weight vectors using the singular value decomposition algorithm. Compact networks are thus obtained in which the phoneme discrimination surfaces are modulated by phonetic context. These methods are demonstrated on a small but non-trivial vowel recognition problem. It is shown that a context-modulated network can achieve a lower error rate than a context-independent network by a factor of 7. Similar results are obtained using optimized rather than constructed networks.

[1]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[2]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[3]  C. Lee Giles,et al.  Encoding Geometric Invariances in Higher-Order Neural Networks , 1987, NIPS.

[4]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Leigh Lisker The Distinction between [æ] and [ε]: A Problem in Acoustic Analysis@@@The Distinction between [ae] and [e]: A Problem in Acoustic Analysis , 1948 .

[6]  S. Blumstein,et al.  Phonetic features and acoustic invariance in speech , 1981, Cognition.

[7]  Raymond L. Watrous Context‐modulated discrimination of similar vowels using second‐order connectionist networks , 1989 .

[8]  Richard M. Schwartz,et al.  Improved hidden Markov modeling of phonemes for continuous speech recognition , 1984, ICASSP.

[9]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[10]  K. Stevens,et al.  Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model of articulation. , 1966, The Journal of the Acoustical Society of America.

[11]  Thomas Brooks Martin,et al.  Acoustic recognition of a limited vocabulary in continuous speech , 1970 .

[12]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[13]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[14]  Alex Waibel,et al.  The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Raymond L. Watrous Phoneme Discrimination Using Connectionist Networks , 1990, Machine Learning: From Theory to Applications.

[17]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.