A partitioned neural network approach for vowel classification using smoothed time/frequency features

A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed using a neural network classifier that is trained to discriminate the two members of one pair of categories. Multiple two way classification decisions are then combined to form an N-way decision. Some of the advantages of the new classification approach include the partitioning of the task allowing independent feature and classifier optimization for each pair of categories, lowered sensitivity of classification performance on network parameters, a reduction in the amount of training data required, and potential for superior performance relative to a single large network. The features described in this paper, closely related to the cepstral coefficients and delta cepstra commonly used in speech analysis, are developed using a unified mathematical framework which allows arbitrary nonlinear frequency, amplitude, and time scales to compactly represent the spectral/temporal characteristics of speech. This classification approach, combined with a feature ranking algorithm which selected the 35 most discriminative spectral/temporal features for each vowel pair, resulted in 71.5% accuracy for classification of 16 vowels extracted from the TIMIT database. These results, significantly higher than other published results for the same task, illustrate the potential for the methods presented in this paper.

[1]  George Zavaliagkos,et al.  A hybrid segmental neural net/hidden Markov model system for continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[2]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Ronald A. Cole,et al.  Perceptual studies on vowels excised from continuous speech , 1992, ICSLP.

[4]  J. Picone,et al.  Continuous speech recognition using hidden Markov models , 1990, IEEE ASSP Magazine.

[5]  S. Zahorian,et al.  Dynamic spectral shape features as acoustic correlates for initial stop consonants , 1991 .

[6]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[7]  Helen Meng,et al.  Signal representation comparison for phonetic classification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[9]  James R. Glass,et al.  Modelling spectral dynamics for vowel classification , 1993, EUROSPEECH.

[10]  S. Zahorian,et al.  Spectral-shape features versus formants as acoustic correlates for vowels. , 1993, The Journal of the Acoustical Society of America.

[11]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Stephen A. Zahorian,et al.  Acoustic-phonetic transformations for improved speaker-independent isolated word recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[15]  Victor W. Zue,et al.  Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  D. J. Burr Comparison of Gaussian and neural network classifiers on vowel recognition using the discrete cosine transform , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Stephen A. Zahorian,et al.  Text-independent speaker identification using binary-pair partitioned neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[18]  H. Gish,et al.  A probabilistic approach to the understanding and training of neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[19]  James R. Glass,et al.  Statistical trajectory models for phonetic recognition , 1994, ICSLP.

[20]  Stephen A. Zahorian,et al.  Signal modeling enhancements for automatic speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[22]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[23]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Frederick Jelinek,et al.  Continuous speech recognition , 1977, SGAR.