Explicit, N-best formant features for vowel classification

We demonstrate the use of explicit formant features for vowel and semi-vowel classification. The formant trajectories are approximated by either three line segments or Legendre polynomials. Together with formant amplitude, formant bandwidth, pitch, and segment duration, these formant features form a compact feature representation which performs as well (71.8%) as a cepstral-based feature representation (71.6%). The combination of the formant and cepstral feature improves the accuracy further to 73.4%. Additionally, we outline future experiments using our robust, N-best formant tracker.

[1]  Marie-Odile Berger,et al.  A new paradigm for reliable automatic formant tracking , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hong C. Leung,et al.  The effects of signal representations, phonetic classification techniques, and the telephone network , 1992, ICSLP.

[3]  S. Krishnan,et al.  Segmental phoneme recognition using piecewise linear regression , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Etienne Barnard,et al.  Explicit N-Best Formant Features for Segment-Based Speech Recognition , 1996 .

[5]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Chung Leung Hong The use of artificial neural networks for phonetic recognition , 1989 .

[7]  D. Talkin Speech formant trajectory estimation using dynamic programming with modulated transition costs , 1987 .

[8]  Etienne Barnard,et al.  Robust, n-best formant tracking , 1995, EUROSPEECH.

[9]  Vassilios Digalakis,et al.  Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[10]  Sin-Horng Chen,et al.  Vector quantization of pitch information in Mandarin speech , 1990, IEEE Trans. Commun..

[11]  J. D. Miller,et al.  Auditory-perceptual interpretation of the vowel. , 1989, The Journal of the Acoustical Society of America.

[12]  Dennis H. Klatt,et al.  Review of the ARPA speech understanding project , 1990 .

[13]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .