Speech and neural network dynamics

This thesis is concerned with two principal issues. Firstly the radial basis functions (RBF) network is introduced and its properties related to other statistical and neural network classifiers. Results from a series of speech recognition experiments, using this network architecture, are reported. These experiments included a continuous speech recognition task with a 571 word lexicon. Secondly, a study of the dynamics of a simple recurrent network model is presented. This study was performed numerically, via a survey of network power spectra and a detailed investigation of the dynamics displayed by a particular network. Word and sentence recognition errors are reported for a continuous speech recognition system using RBF network phoneme modelling with Viterbi smoothing, using either a restricted grammar or no grammar whatsoever. In a cytopathology task domain the best RBF/Viterbi system produced first choice word errors of 6% and sentence errors of 14%, using a grammar of perplexity 6. This compares with word errors of 4% and sentence errors of 8% using the best CSTR hidden Markov model configuration. RBF networks were also used for a static vowel labelling task using hand-segmented vowels excised from continuous speech. Results were not worse than those obtained using statistical classifiers. The second part of this thesis is a computational study of the dynamics of a recurrent neural network model. Two investigations were undertaken. Firstly, a survey of network power spectra was used to map out the temporal activity of this network model (within a four dimensional parameter space) via summary statistics of the network power spectra. Secondly, the dynamics of a particular network were investigated. The dynamics were analysed using bifurcation diagrams, power spectra, the computation of Liapunov exponents and fractal dimensions and the plotting of 2-dimensional attractor projections. Complex dynamical behaviour was observed including Hopf bifurcations, the RuelleTakens-Newhouse route to chaos with mode-locking at rational winding numbers, the period-doubling route to chaos and the presence of multiple coexisting attractors.

[1]  Richard P. Lippmann,et al.  A neural net approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Yasuo Ariki,et al.  Enhancement and optimisation of a speech recognition front end based on hidden Markov models , 1989, EUROSPEECH.

[4]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[5]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  S. Young Competitive training in hidden Markov models , 1990 .

[8]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[9]  Holger Kantz,et al.  Repellers, semi-attractors, and long-lived chaotic transients , 1985 .

[10]  David Ruelle,et al.  OCCURRENCE OF STRANGE AXIOM A ATTRACTORS NEAR QUASI PERIODIC FLOWS ON TM, M IS GREATER THAN OR EQUAL TO 3 , 1978 .

[11]  Edward Ott,et al.  Controlling chaos , 2006, Scholarpedia.

[12]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[13]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[14]  Alexander H. Waibel,et al.  Incremental Parsing by Modular Recurrent Connectionist Networks , 1989, NIPS.

[15]  A W Huggins,et al.  Speech quality evaluation using "phoneme-specific" sentences. , 1985, The Journal of the Acoustical Society of America.

[16]  David Lowe,et al.  On Networks, Optimised Feature Extraction and the Bayes Decision , 1989 .

[17]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[18]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[19]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  Richard P. Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[21]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[22]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[23]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[24]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ichiro Tsuda,et al.  Memory Dynamics in Asynchronous Neural Networks , 1987 .

[26]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[27]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[28]  R. Westervelt,et al.  Dynamics of iterated-map neural networks. , 1989, Physical review. A, General physics.

[29]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[30]  Tomas Bohr,et al.  Transition to chaos by interaction of resonances in dissipative systems. I: Circle maps , 1984 .

[31]  John H. Holland,et al.  Tests on a cell assembly theory of the action of the brain, using a large digital computer , 1956, IRE Trans. Inf. Theory.

[32]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[33]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[34]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[35]  Jenq-Neng Hwang,et al.  A systolic neural network architecture for hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  S Nara,et al.  Pattern retrieval in an asymmetric neural network with embedded limit cycles , 1989 .

[37]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[38]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[39]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[40]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[41]  Joachim M. Buhmann,et al.  Pattern Segmentation in Associative Memory , 1990, Neural Computation.

[42]  Raymond L. Watrous Phoneme Discrimination Using Connectionist Networks , 1990, Machine Learning: From Theory to Applications.

[43]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[44]  Catherine Nicolis,et al.  Chaotic dynamics, Markov partitions, and Zipf's law , 1989 .

[45]  Michael T. Manry,et al.  Iterative improvement of a Gaussian classifier , 1990, Neural Networks.

[46]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[47]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[48]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[49]  James B. Ramsey,et al.  The statistical properties of dimension calculations using small data sets , 1990 .

[50]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[51]  L. Shastri,et al.  From Simple Associations to Systemic Reasoning: A Connectionist Representation of Rules, Variables and Dynamic Bindings , 1990 .

[52]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[53]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[54]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .

[55]  Stephanie Seneff,et al.  A computational model for the peripheral auditory system: Application of speech recognition research , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[57]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[58]  D. Lowe,et al.  Adaptive networks, dynamical systems, and the predictive analysis of time series speech analysis , 1989 .

[59]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[60]  Charles M. Marcus,et al.  Dynamics of Analog Neural Networks with Time Delay , 1988, NIPS.

[61]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[62]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[63]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[64]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[65]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[66]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[67]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[68]  Kiyohiro Shikano,et al.  Integrated training for spotting Japanese phonemes using large phonemic time-delay neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[69]  Günther Palm,et al.  Brain Theory - Reprint Volume , 1988 .

[70]  Kanter,et al.  Temporal association in asymmetric neural networks. , 1986, Physical review letters.

[71]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[73]  Richard W. Prager,et al.  The modified Kanerva model for automatic speech recognition , 1989 .

[74]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[75]  G. Parisi,et al.  Asymmetric neural networks and the process of learning , 1986 .

[76]  David Lowe,et al.  A Hybrid Optimisation Strategy for Adaptive Feed-Forward Layered Networks , 1988 .

[77]  John W. Clark,et al.  Chaos in neural systems , 1986 .

[78]  Sommers,et al.  Chaos in random neural networks. , 1988, Physical review letters.

[79]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[80]  Victor W. Zue,et al.  Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.