BAYESIAN LEARNING FOR MODELS OF HUMAN SPEECH PERCEPTION

Humans speech recognition error rates are 30 times lower than machine error rates. Psychophysical experiments have pinpointed a number of specific human behaviors that may contribute to accurate speech recognition, but previous attempts to incorporate such behaviors into automatic speech recognition have often failed because the resulting models could not be easily trained from data. This paper describes Bayesian learning methods for computational models of human speech perception. Specifically, t.he linked comput,ational models proposed in this paper seek to imitate the following human behaviors: independence of distinctive feature errors, perceptual magnet effect, the vowel sequence illusion, sensitivity to energy onsets and offsets, and redundant use of asynchronous acoustic correlates. The proposed models differ from many previous computational psychological models in that the desired behavior is learned from data, using a constrained optimizat,ion algorithm (the Eh1 algorithm), rather than being coded into t.he model as a series of fixed rules.

[1]  J. L. Miller,et al.  Phonetic prototypes: influence of place of articulation and speaking rate on the internal structure of voicing categories. , 1992, The Journal of the Acoustical Society of America.

[2]  S. Zahorian,et al.  Dynamic spectral shape features as acoustic correlates for initial stop consonants , 1991 .

[3]  M. Sachs,et al.  Representation of stop consonants in the discharge patterns of auditory-nerve fibers. , 1983, The Journal of the Acoustical Society of America.

[4]  Abeer Abdul-Hussain. Alwan,et al.  Modeling speech perception in noise : the stop consonants as a case study , 1992 .

[5]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[6]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[7]  L. Tan,et al.  Distinct brain regions associated with syllable and phoneme , 2003, Human brain mapping.

[8]  Partha Niyogi,et al.  Distinctive feature detection using support vector machines , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[10]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[11]  R. M. Warren,et al.  The vowel-sequence illusion: intrasubject stability and intersubject agreement of syllabic forms. , 1996, The Journal of the Acoustical Society of America.

[12]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[13]  K. Stevens Evidence for the role of acoustic boundaries in the perception of speech sounds , 1981 .

[14]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[15]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[16]  M. Dorman,et al.  Exploration of the perceptual magnet effect using the mismatch negativity auditory evoked potential. , 1997, The Journal of the Acoustical Society of America.