Comparison of Gaussian and neural network classifiers on vowel recognition using the discrete cosine transform

The results of some experiments using a discrete cosine transform (DCT) to represent vowel spectra for classification by a neural network are described. The results are compared to a Gaussian classifier trained on the same database. The results show that the DCT classifies vowels using fewer coefficients than the cepstrum. The neural network classifier performs better than the Gaussian classifier, especially with large input feature sets consisting of delta coefficients and formant/pitch features. Best performance using these features was 58.2%. This compares well with other results reported for these data.<<ETX>>

[1]  S A Zahorian,et al.  Speaker normalization of static and dynamic vowel spectral features. , 1991, The Journal of the Acoustical Society of America.

[2]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .

[3]  R.A. Cole,et al.  Speaker-independent vowel recognition: spectrograms versus cochleagrams , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  G. Wahba Spline models for observational data , 1990 .

[5]  David J. Burr,et al.  Experiments on neural net recognition of spoken and written text , 1988, IEEE Trans. Acoust. Speech Signal Process..

[6]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..