Primary segmentation of auditory scenes

This work addresses the problem of separation of a complex auditory scene by humans, a process which is known as auditory scene analysis. From hierarchal point of view, auditory scene analysis can be described as grouping of auditory entities of one level to form entities of higher level of representation. One of the main difficulties in describing this hierarchal construct is to describe the first level of auditory representation i.e. the elementary auditory entities. This work defines auditory elementary units and describes a computational model for the formation of these units, each unit represents an independent acoustic component which is part of the outcome of one acoustic source. As elementary units are the first level of auditory analysis and serve as a basis for further grouping and formation of meaningful mental representation of auditory sources, this work should be seen as a first stage towards a more general model of auditory scene analysis which includes modeling of grouping processes of auditory elementary units.

[1]  David K. Mellinger,et al.  Event formation and separation in musical sound , 1992 .

[2]  S. W. Beet,et al.  Visual representations of speech signals , 1993 .

[3]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  M Kubovy,et al.  Tone-segregation by phase: on the phase sensitivity of the single ear. , 1979, The Journal of the Acoustical Society of America.

[5]  Mitch Weintraub The GRASP sound separation system , 1984, ICASSP.

[6]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[8]  L. Rousseau,et al.  Auditory intensity changes can cue perception of transformation, accompaniment, or replacement , 1991 .

[9]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[10]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[11]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[13]  R. Meddis,et al.  A Computer Model of Auditory Stream Segregation , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.