Modeling the identification of concurrent vowels with different fundamental frequencies.

Human listeners are better able to identify two simultaneous vowels if the fundamental frequencies of the vowels are different. A computational model is presented which, for the first time, is able to simulate this phenomenon at least qualitatively. The first stage of the model is based upon a bank of bandpass filters and inner hair-cell simulators that simulate approximately the most relevant characteristics of the human auditory periphery. The output of each filter/hair-cell channel is then autocorrelated to extract pitch and timbre information. The pooled autocorrelation function (ACF) based on all channels is used to derive a pitch estimate for one of the component vowels from a signal composed of two vowels. Individual channel ACFs showing a pitch peak at this value are combined and used to identify the first vowel using a template matching procedure. The ACFs in the remaining channels are then combined and used to identify the second vowel. Model recognition performance shows a rapid improvement in correct vowel identification as the difference between the fundamental frequencies of two simultaneous vowels increases from zero to one semitone in a manner closely resembling human performance. As this difference increases up to four semitones, performance improves further only slowly, if at all.

[1]  D. Broadbent Failures of attention in selective listening. , 1952, Journal of experimental psychology.

[2]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[3]  E. Carterette,et al.  Some Factors Affecting Multi‐Channel Listening , 1954 .

[4]  A. Treisman Contextual Cues in Selective Listening , 1960 .

[5]  Koch Sigmund Ed,et al.  Psychology: A Study of A Science , 1962 .

[6]  J. L. Goldstein An optimum processor theory for the central formation of the pitch of complex tones. , 1973, The Journal of the Acoustical Society of America.

[7]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[8]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[9]  J. L. Goldstein,et al.  Evidence for a general template in central optimal processing for pitch of complex tones. , 1978, The Journal of the Acoustical Society of America.

[10]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[11]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[12]  V. Nedzelnitsky,et al.  Sound pressures in the basal turn of the cat cochlea. , 1980, The Journal of the Acoustical Society of America.

[13]  C. Darwin,et al.  The Quarterly Journal of Experimental Psychology Section a Human Experimental Psychology Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time , 2022 .

[14]  L. F. Willems,et al.  Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[15]  S. G. Nooteboom,et al.  Intonation and the perceptual separation of simultaneous voices , 1982 .

[16]  M. Scheffers Simulation of auditory analysis of pitch: an elaboration on the DWS pitch meter. , 1983, The Journal of the Acoustical Society of America.

[17]  Michaël Titus Maria Scheffers,et al.  Sifting vowels. Auditory pitch analysis and sound segregation. , 1983 .

[18]  C. Darwin Perceiving vowels in the presence of another sound: constraints on formant perception. , 1984, The Journal of the Acoustical Society of America.

[19]  U. Tilmann Zwicker,et al.  Auditory recognition of diotic and dichotic vowel pairs , 1984, Speech Commun..

[20]  A. Bregman,et al.  Perceptual segregation of simultaneous vowels presented as steady states and as parallel and crossing glides , 1984 .

[21]  Robert D. Frisina Enhancement of responses to amplitude modulation in the gerbil cochlear nucleus: Single‐unit recordings using an improved surgical approach , 1984 .

[22]  R. Meddis Simulation of mechanical to neural transduction in the auditory receptor. , 1986, The Journal of the Acoustical Society of America.

[23]  B C Moore Parallels between frequency selectivity measured psychophysically and in cochlear mechanics. , 1986, Scandinavian audiology. Supplementum.

[24]  Mitchel Weintraub,et al.  Sound Separation and Auditory Perceptual Organization , 1987 .

[25]  Brian C. J. Moore,et al.  Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns , 1987, Hearing Research.

[26]  R J Stubbs,et al.  Evaluation of two voice-separation algorithms using normal-hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[27]  W. Hartmann Pitch Perception and the Segregation and Integration of Auditory Entities , 1988 .

[28]  R Meddis,et al.  Simulation of auditory-neural transduction: further studies. , 1988, The Journal of the Acoustical Society of America.

[29]  J Lazzaro,et al.  Silicon modeling of pitch perception. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[30]  A. Bregman,et al.  The perceptual segregation of simultaneous auditory signals: Pulse train segregation and vowel segregation , 1989, Perception & psychophysics.

[31]  A J Houtsma,et al.  Pitch identification of simultaneous diotic and dichotic two-tone complexes. , 1989, The Journal of the Acoustical Society of America.

[32]  R. B. Gardner,et al.  Perceptual grouping of formants with static and dynamic differences in fundamental frequency , 1989 .

[33]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with the same fundamental frequency. , 1989, The Journal of the Acoustical Society of America.

[34]  W A Yost,et al.  Modulation interference in detection and discrimination of amplitude modulation. , 1989, The Journal of the Acoustical Society of America.

[35]  A. Rees,et al.  Neuronal responses to amplitude-modulated and pure-tone stimuli in the guinea pig inferior colliculus, and their modification by broadband noise. , 1989, The Journal of the Acoustical Society of America.

[36]  B. Moore,et al.  Temporal window shape as a function of frequency and level. , 1989, The Journal of the Acoustical Society of America.

[37]  R. Meddis,et al.  Implementation details of a computation model of the inner hair‐cell auditory‐nerve synapse , 1990 .

[38]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[39]  R J Stubbs,et al.  Algorithms for separating the speech of interfering talkers: evaluations with voiced sentences, and normal-hearing and hearing-impaired listeners. , 1990, The Journal of the Acoustical Society of America.

[40]  R Meddis,et al.  An evaluation of eight computer models of mammalian inner hair-cell function. , 1991, The Journal of the Acoustical Society of America.

[41]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .