论文信息 - Speech Recognition Using Connectionist Networks Dissertation Proposal

Speech Recognition Using Connectionist Networks Dissertation Proposal

The thesis of the proposed research is that connectionist networks are adequate models for the problem of acoustic phonetic speech recognition by computer. Adequacy is defined as suitably high recognition performance on a representative set of speech recognition problems. Seven acoustic phonetic problems are selected and discussed in relation to a physiological theory of phonetics. It is argued that the selected tasks are sufficiently representative and difficult to constitute a reasonable test of adequacy. A connectionist network is a fine-grained parallel distributed processing configuration, in which simple processing elements are interconnected by scalar links. A connectionist network model for speech recognition has been defined called the temporal flow model. The model incorporates link propagation delay and internal feedback to express temporal relationships. The model is contrasted with other connectionist models in which time is represented explicitly by separate processing elements for each time sample. It has been shown previously that temporal flow models can be 'trained' to perform successfully some speech recognition tasks. A method of 'learning' using techniques of numerical nonlinear optimization has been demonstrated. Methods for extending these results to the problems selected for this research are presented. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-88-44. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/697 SPEECH RECOGNITION USING CONNECTIONIST NETWORKS DISSERTATION PROPOSAL

Raymond L. Watrous

[1] Dennis H. Klatt,et al. Review of the ARPA speech understanding project , 1990 .

[2] Geoffrey E. Hinton,et al. Experiments on Learning by Back Propagation. , 1986 .

[3] G. E. Peterson,et al. A physiological theory of phonetics. , 1966, Journal of speech and hearing research.

[4] L. Lisker. Closure Duration and the Intervocalic Voiced-Voiceless Distinction in English , 1957 .

[5] Jeffrey L. Elman,et al. Interactive processes in speech perception: the TRACE model , 1986 .

[6] Lokendra Shastri,et al. Learned phonetic discrimination using connectionist networks , 1990, ECST.

[7] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[8] Dean H. Obrecht. Three Experiments in the Perception of Geminate Consonants in Arabic , 1965 .

[9] Franklin S. Cooper,et al. Speech Understanding Systems , 1976, Artificial Intelligence.

[10] Raymond L. Watrous. Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[11] Richard P. Lippmann,et al. An introduction to computing with neural nets , 1987 .

[12] Leigh Lisker. The Distinction between [æ] and [ε]: A Problem in Acoustic Analysis@@@The Distinction between [ae] and [e]: A Problem in Acoustic Analysis , 1948 .

[13] D Zipser,et al. Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[14] Lokendra Shastri,et al. Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[15] T. D. Harrison,et al. Boltzmann machines for speech recognition , 1986 .

[16] Leigh Lisker,et al. On reconciling monophthongal vowel percepts and continuously varying F patterns , 1984 .