Modelling the perceptual segregation of double vowels with a network of neural oscillators

Abstract The ability of listeners to identify two simultaneously presented vowels can be introducing a difference in fundamental frequency (F0) between the vowels. We propose an explanation for this phenomenon in the form of a computational model of concurrent sound segregation, which is motivated by neurophysiological evidence of oscillatory firing activity in the auditory cortex and thalamus. More specifically, the model represents the perceptual grouping of auditory frequency channels as synchronised (phase-locked zero phase lag) oscillations in a neural network. Computer simulations on a vowel set used in psychophysical studies confirm that the model qualitatively matches the performance of human listeners; vowel identification performance increases with increasing difference in F0. Additionally, the model is able to replicate other findings relating to the perception of harmonic complexes in which one component is mistuned.

[1]  T. C. Rand,et al.  Dichotic release from masking for speech , 1974 .

[2]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[3]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[4]  A J Houtsma,et al.  Pitch identification of simultaneous diotic and dichotic two-tone complexes. , 1989, The Journal of the Acoustical Society of America.

[5]  Guy J. Brown,et al.  Temporal synchronization in a neural oscillator model of primitive auditory stream segregation , 1998 .

[6]  A M Liberman,et al.  Duplex perception of cues for stop consonants: Evidence for a phonetic mode , 1981, Perception & psychophysics.

[7]  Q Summerfield,et al.  The contribution of waveform interactions to the perception of concurrent vowels. , 1994, The Journal of the Acoustical Society of America.

[8]  W Singer,et al.  Visual feature integration and the temporal correlation hypothesis. , 1995, Annual review of neuroscience.

[9]  R. Llinás,et al.  Coherent 40-Hz oscillation characterizes dream state in humans. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R D Patterson,et al.  The time course of auditory segregation: concurrent vowels that vary in duration. , 1995, The Journal of the Acoustical Society of America.

[11]  David Terman,et al.  Partial Synchronization in a Network of Neural Oscillators , 1997, SIAM J. Appl. Math..

[12]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[13]  M. R. Jones,et al.  Time, our lost dimension: toward a new theory of perception, attention, and memory. , 1976, Psychological review.

[14]  Marius Usher,et al.  Oscillatory Model of Short Term Memory , 1991, NIPS.

[15]  C. Darwin,et al.  Perceptual separation of simultaneous vowels: within and across-formant grouping by F0. , 1993, The Journal of the Acoustical Society of America.

[16]  A. Bregman How Does Physiology Support Auditory Scene Analysis , 1992 .

[17]  Bill Baird A cortical network model of cognitive attentional strems, rhythmic expectation, and auditory stream segregation , 1997 .

[18]  J. Cowan,et al.  Excitatory and inhibitory interactions in localized populations of model neurons. , 1972, Biophysical journal.

[19]  S. Makeig,et al.  A 40-Hz auditory potential recorded from the human scalp. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[21]  W. Singer,et al.  Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties , 1989, Nature.

[22]  R Meddis,et al.  Modeling the identification of concurrent vowels with different fundamental frequencies. , 1992, The Journal of the Acoustical Society of America.

[23]  Ira J. Hirsh,et al.  Auditory Perception of Temporal Order , 1959 .

[24]  DeLiang Wang,et al.  Primitive Auditory Segregation Based on Oscillatory Correlation , 1996, Cogn. Sci..

[25]  Toshio Sone,et al.  Equal-loudness level contours for pure tone under free field listening conditions (I) : Some data and considerations on experimental conditions , 1989 .

[26]  A. Bregman,et al.  The perceptual segregation of simultaneous auditory signals: Pulse train segregation and vowel segregation , 1989, Perception & psychophysics.

[27]  Deliang Wang,et al.  Global competition and local cooperation in a network of neural oscillators , 1995 .

[28]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[29]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[30]  R. Llinás,et al.  Human oscillatory brain activity near 40 Hz coexists with cognitive temporal binding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[31]  B. Moore,et al.  Relative dominance of individual partials in determining the pitch of complex tones , 1985 .

[32]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[33]  R. Meddis Simulation of mechanical to neural transduction in the auditory receptor. , 1986, The Journal of the Acoustical Society of America.

[34]  M. R. Jones,et al.  Evidence for rhythmic attention. , 1981, Journal of experimental psychology. Human perception and performance.

[35]  S. Yoshizawa,et al.  An Active Pulse Transmission Line Simulating Nerve Axon , 1962, Proceedings of the IRE.

[36]  Xiao-Jing Wang,et al.  Alternating and Synchronous Rhythms in Reciprocally Inhibitory Model Neurons , 1992, Neural Computation.

[37]  C. Gray,et al.  Chattering Cells: Superficial Pyramidal Neurons Contributing to the Generation of Synchronous Oscillations in the Visual Cortex , 1996, Science.

[38]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[39]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[40]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[41]  DeLiang Wang,et al.  Auditory stream segregation based on oscillatory correlation , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[42]  K. D. Singh,et al.  Magnetic field tomography of coherent thalamocortical 40-Hz oscillations in humans. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[43]  R Meddis,et al.  Simulation of auditory-neural transduction: further studies. , 1988, The Journal of the Acoustical Society of America.

[44]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[45]  Michaël Titus Maria Scheffers,et al.  Sifting vowels. Auditory pitch analysis and sound segregation. , 1983 .

[46]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[47]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[48]  W. Singer Synchronization of cortical activity and its putative role in information processing and learning. , 1993, Annual review of physiology.

[49]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[50]  D. Barth,et al.  Thalamic modulation of high-frequency oscillating potentials in auditory cortex , 1996, Nature.

[51]  R. FitzHugh Impulses and Physiological States in Theoretical Models of Nerve Membrane. , 1961, Biophysical journal.

[52]  DeLiang Wang Stream segregation based on oscillatory correlation , 1998 .

[53]  DeLiang Wang,et al.  Locally excitatory globally inhibitory oscillator networks , 1995, IEEE Transactions on Neural Networks.

[54]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.