On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis

In his famous treatise of computational vision, Marr (1982) makes a compelling argument for separating different levels of analysis in order to understand complex information processing. In particular, the computational theory level, concerned with the goal of computation and general processing strategy, must be separated from the algorithm level, or the separation of what from how. This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA problem.

[1]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[2]  J. Bird Effects of a difference in fundamental frequency in separating two sentences. , 1997 .

[3]  Guy J. Brown,et al.  A computational model of auditory selective attention , 2004, IEEE Transactions on Neural Networks.

[4]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Alan R. Palmer,et al.  Psychophysical and Physiological Advances in Hearing , 1998 .

[6]  Hervé Glotin Elaboration et comparaison de systèmes adaptatifs multi-flux de reconnaissance robuste de la parole : incorporation des indices de voisement et de localisation , 2001 .

[7]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[8]  DeLiang Wang,et al.  Primitive Auditory Segregation Based on Oscillatory Correlation , 1996, Cogn. Sci..

[9]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[11]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[12]  B. Moore Cochlear Hearing Loss , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[13]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[14]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[15]  Michael J. Denham,et al.  A Model of Auditory Streaming , 1995, NIPS.

[16]  H. Pashler The Psychology of Attention , 1997 .

[17]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[18]  A. Treisman Solutions to the Binding Problem Progress through Controversy and Convergence , 1999, Neuron.

[19]  David F. Rosenthal,et al.  Computational auditory scene analysis , 1998 .

[20]  Guy J. Brown,et al.  A comparison of auditory and blind separation techniques for speech segregation , 2001, IEEE Trans. Speech Audio Process..

[21]  D. Ellis,et al.  Speech separation in humans and machines , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[22]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[23]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[24]  R J Stubbs,et al.  Algorithms for separating the speech of interfering talkers: evaluations with voiced sentences, and normal-hearing and hearing-impaired listeners. , 1990, The Journal of the Acoustical Society of America.

[25]  DeLiang Wang,et al.  Monaural Speech Separation , 2002, NIPS.

[26]  J. Gibson The Senses Considered As Perceptual Systems , 1967 .

[27]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[28]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[29]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[30]  R J Stubbs,et al.  Evaluation of two voice-separation algorithms using normal-hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[31]  E. Oja,et al.  Independent Component Analysis , 2013 .

[32]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[33]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[34]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[35]  Michael James. Norris Assessment and extension of Wang's oscillatory model of auditory stream segregation , 2003 .

[36]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[37]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..

[38]  R. Carlyon,et al.  Comparing the fundamental frequencies of resolved and unresolved harmonics: Evidence for two pitch mechanisms? , 1994 .

[39]  DeLiang Wang,et al.  Speech segregation based on pitch tracking and amplitude modulation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).