Initial speech recognition results using the multinet architecture

Multinet is a connectionist architecture designed for certain difficult multi-class pattern classification tasks. These are characterised by very large input feature sp aces, rendering a monolithic classifier impractical. The architecture consists of a layer with at least one primary ‘detector’ for each class, followed by a combining net which estimates the posterior probabilities for all classes. Typically primary detectors only input a subset of the input features. Thus the architecture decomposes classification in two ways: by class and by factoring of the input space dimensions. Multinet incorporates the ideas of Modular Neural Networks and Ensembles. In this paper we investigate the use of Multinet on standard speech recognition tasks and present results for phoneme recognition on TIMIT and word recognition on RM. We show Multinet’s performance is comparable with standard HMM and hybrid HMM-NN systems that we run on the same tasks. The value and potential of the Multinet approach is shown by detailing successive improvements to the Multinet system which are easily obtained because of the m odularity of the architecture.

[1]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[2]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .

[3]  Hervé Bourlard,et al.  An introduction to the hybrid hmm/connectionist approach , 1995 .

[4]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  T. J. Reynolds,et al.  Phoneme classification with multinets , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[6]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[7]  Christos Andrea Antoniou,et al.  Multinet: A New Connectionist Architecture for Speech Recognition , 1998 .

[8]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[9]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[10]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[12]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[13]  Michael Finke,et al.  ACID/HNN: clustering hierarchies of neural networks for context-dependent connectionist acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Mohamed S. Kamel,et al.  Modular Neural Network Classifiers: A Comparative Study , 1998, J. Intell. Robotic Syst..