Primitive Auditory Segregation Based on Oscillatory Correlation

Auditory scene analysis is critical for complex auditory processing. We study auditory segregation from the neural network perspective, and develop a framework for primitive auditory scene analysis. The architecture is a laterally coupled two-dimensional network of relaxation oscillators with a global inhibitor. One dimension represents time and another one represents frequency. We show that this architecture, plus systematic delay lines, can in real time group auditory features into a stream by phase synchrony and segregate different streams by desynchronization. The network demonstrates a set of psychological phenomena regarding primitive auditory scene analysis, including dependency on frequency proximity and the rate of presentation, sequential capturing, and competition among different perceptual organizations. We offer a neurocomputational theory—shifting synchronization theory—for explaining how auditory segregation might be achieved in the brain, and the psychological phenomenon of stream segregation. Possible extensions of the model are discussed.

[1]  Albert S. Bregman,et al.  Auditory scene analysis : hearing in complex environments , 1993 .

[2]  DeLiang Wang,et al.  Locally excitatory globally inhibitory oscillator networks , 1995, IEEE Transactions on Neural Networks.

[3]  Mitchel Weintraub,et al.  A computational model for separating two simultaneous talkers , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  R J Jagacinski,et al.  Tests of attentional flexibility in listening to polyrhythmic patterns. , 1995, Journal of experimental psychology. Human perception and performance.

[5]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[6]  S. Yoshizawa,et al.  An Active Pulse Transmission Line Simulating Nerve Axon , 1962, Proceedings of the IRE.

[7]  K. D. Singh,et al.  Magnetic field tomography of coherent thalamocortical 40-Hz oscillations in humans. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[8]  DeLiang Wang,et al.  Modeling Global Synchrony in the Visual Cortex by Locally Coupled Neural Oscillators , 1994 .

[9]  W. Singer,et al.  Temporal Coding in the Brain , 1994, Research and Perspectives in Neurosciences.

[10]  Harry G. Barrow,et al.  The Role of Weight Normalization in Competitive Learning , 1994, Neural Computation.

[11]  C. Morris,et al.  Voltage oscillations in the barnacle giant muscle fiber. , 1981, Biophysical journal.

[12]  Robert F. Port,et al.  Neural Representation of Temporal Patterns , 1995, Springer US.

[13]  W. Singer Synchronization of cortical activity and its putative role in information processing and learning. , 1993, Annual review of physiology.

[14]  Henk Spekreijse,et al.  Contour from motion processing occurs in primary visual cortex , 1993, Nature.

[15]  S. Handel Listening As Introduction to the Perception of Auditory Events , 1989 .

[16]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[17]  R. M. Warren,et al.  Auditory Sequence: Confusion of Patterns Other Than Speech or Music , 1969, Science.

[18]  Deliang Wang,et al.  Global competition and local cooperation in a network of neural oscillators , 1995 .

[19]  E. Fetz,et al.  Coherent 25- to 35-Hz oscillations in the sensorimotor cortex of awake behaving monkeys. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[21]  A. Bregman Auditory streaming: Competition among alternative organizations , 1978, Perception & psychophysics.

[22]  John J. Hopfield,et al.  Neural Architecture and Biophysics for Sequence Recognition , 1989 .

[23]  M. R. Jones,et al.  Evidence for rhythmic attention. , 1981, Journal of experimental psychology. Human perception and performance.

[24]  W. Dowling,et al.  Aiming attention in pitch and time in the perception of interleaved melodies , 1987, Perception & psychophysics.

[25]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[26]  E D Young,et al.  Neural network models of sound localization based on directional filtering by the pinna. , 1992, The Journal of the Acoustical Society of America.

[27]  S Hocherman,et al.  Dependence of auditory cortex evoked unit activity on interstimulus interval in the cat. , 1981, Journal of neurophysiology.

[28]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[29]  Sompolinsky,et al.  Cooperative dynamics in visual processing. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[30]  K. Koffka Principles Of Gestalt Psychology , 1936 .

[31]  Leslie S. Smith Sound segmentation using onsets and offsets , 1994 .

[32]  P König,et al.  Synchronization of oscillatory neuronal responses between striate and extrastriate visual cortical areas of the cat. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[33]  A. Bregman,et al.  Crossing of Auditory Streams , 1985 .

[34]  DeLiang Wang,et al.  An Oscillatory Correlation Theory of Temporal Pattern Segmentation , 1995 .

[35]  D. Massaro,et al.  Cross-octave masking of single tones and musical sequences: The effects of structure on auditory recognition , 1976 .

[36]  R. Meddis,et al.  A Computer Model of Auditory Stream Segregation , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[37]  F. Bloom Principles of Neural Science, 3rd ed , 1993 .

[38]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[39]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[40]  Professor Moshe Abeles,et al.  Local Cortical Circuits , 1982, Studies of Brain Function.

[41]  J. Winer The Functional Architecture of the Medial Geniculate Body and the Primary Auditory Cortex , 1992 .

[42]  P. Holmes,et al.  Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields , 1983, Applied Mathematical Sciences.

[43]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[44]  W. Hartmann Pitch Perception and the Segregation and Integration of Auditory Entities , 1988 .

[45]  R Hari,et al.  Evidence for cortical origin of the 40 Hz auditory evoked response in man. , 1987, Electroencephalography and clinical neurophysiology.

[46]  W. Singer,et al.  Interhemispheric synchronization of oscillatory neuronal responses in cat visual cortex , 1991, Science.

[47]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[48]  C. Darwin,et al.  Spectral integration based on common amplitude modulation , 1985, Perception & psychophysics.

[49]  DeLiang Wang,et al.  Auditory stream segregation based on oscillatory correlation , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[50]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[51]  J Lazzaro,et al.  Silicon modeling of pitch perception. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[52]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[53]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[54]  I. Rock,et al.  The legacy of Gestalt psychology. , 1990, Scientific American.

[55]  W Singer,et al.  Visual feature integration and the temporal correlation hypothesis. , 1995, Annual review of neuroscience.

[56]  Michael A. Arbib,et al.  Complex temporal sequence learning based on short-term memory , 1990 .

[57]  G. A. Miller,et al.  The Trill Threshold , 1950 .

[58]  Buhmann Oscillations and low firing rates in associative memory neural networks. , 1989, Physical review. A, General physics.

[59]  DeLiang Wang,et al.  Emergent synchrony in locally coupled neural oscillators , 1995, IEEE Trans. Neural Networks.

[60]  D. Hubel Eye, brain, and vision , 1988 .

[61]  S. Makeig,et al.  A 40-Hz auditory potential recorded from the human scalp. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Norman M. Weinberger,et al.  Responses of single auditory cortical neurons to tone sequences , 1989, Brain Research.

[63]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[64]  Joachim M. Buhmann,et al.  Pattern Segmentation in Associative Memory , 1990, Neural Computation.

[65]  R. Fay,et al.  The Mammalian auditory pathway : neurophysiology , 1992 .

[66]  S. Pinker,et al.  Auditory streaming and the building of timbre. , 1978, Canadian journal of psychology.

[67]  F. Crick Function of the thalamic reticular complex: the searchlight hypothesis. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Michael A. Arbib,et al.  The metaphorical brain 2 - neural networks and beyond (2. ed.) , 1972 .

[69]  Albert S. Bregman,et al.  Capturing frequency components of glided tones: Frequency separation, orientation, and alignment , 1981, Perception & psychophysics.

[70]  James A. Simmons,et al.  A possible neuronal basis for representation of acoustic scenes in auditory cortex of the big brown bat , 1993, Nature.

[71]  M. Jones,et al.  Attending to auditory events: The role of temporal organization. , 1993 .

[72]  Thomas D. Albright,et al.  Neural correlates of perceptual motion coherence , 1992, Nature.

[73]  F. Verhulst Nonlinear Differential Equations and Dynamical Systems , 1989 .

[74]  Stephen Handel,et al.  Effect of Element Composition on Streaming and the Ordering of Repeating Sequences. , 1977 .

[75]  A. Bregman,et al.  Primary auditory stream segregation and perception of order in rapid sequences of tones. , 1971, Journal of experimental psychology.

[76]  F. Crick The Astonishing Hypothesis , 1994 .

[77]  W. Singer,et al.  Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties , 1989, Nature.

[78]  M. R. Jones,et al.  Time, our lost dimension: toward a new theory of perception, attention, and memory. , 1976, Psychological review.

[79]  M. Cynader,et al.  A computational theory of spectral cue localization , 1993 .

[80]  R. FitzHugh Impulses and Physiological States in Theoretical Models of Nerve Membrane. , 1961, Biophysical journal.

[81]  L. V. Noorden Temporal coherence in the perception of tone sequences , 1975 .

[82]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[83]  D C Van Essen,et al.  Shifter circuits: a computational strategy for dynamic aspects of visual processing. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[84]  R. Llinás,et al.  Coherent 40-Hz oscillation characterizes dream state in humans. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[85]  W. Dowling The perception of interleaved melodies , 1973 .

[86]  G. Edelman,et al.  Reentrant signaling among simulated neuronal groups leads to coherency in their oscillatory activity. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[87]  David K. Mellinger,et al.  Event formation and separation in musical sound , 1992 .

[88]  B. Moore An Introduction to the Psychology of Hearing , 1977 .