Speech recognition in a neural network framework: discriminative training of Gaussian models and mixture densities as radial basis functions

The author presents a probabilistic interpretation of two issues in neural network approaches, namely discriminative training and radial basis functions. For the general case of many classes and continuous-valued pattern vectors, it is shown that discriminative training based on squared error or relative entropy amounts to approximating the class or posterior probabilities. In addition, the concept of radial basis functions is interpreted as an approximation to class-conditional probability density functions. From this point of view, continuous mixture densities are considered to be a special kind of radial basis function. Experimental tests were performed on the TI/NIST digit string database. The preliminary results indicate that maximum likelihood based results can be improved by discriminative training.<<ETX>>

[1]  George R. Doddington Phonetically sensitive discriminants for improved speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Harvey F. Silverman,et al.  Neural networks, maximum mutual information training, and maximum likelihood training (speech recognition) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Baxter F. Womack,et al.  An Adaptive Pattern Classification System , 1966, IEEE Trans. Syst. Sci. Cybern..

[5]  N. Otsu,et al.  Nonlinear data analysis and multilayer perceptrons , 1989, International 1989 Joint Conference on Neural Networks.

[6]  Yariv Ephraim,et al.  Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[9]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[10]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.