Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation addresses the problems of binaural localization and segregation in reverberant environments by integrating monaural and binaural cues. Motivated by research in psychoacoustics and by developments in monaural CASA processing, we first develop a probabilistic framework for joint localization and segregation of voiced speech. Pitch cues are used to group sound components across frequency over continuous time intervals. Time-frequency regions resulting from this partial organization are then localized by integrating binaural cues, which enhances robustness to reverberation, and grouped across time based on the estimated locations. We demonstrate that this approach outperforms voiced segregation based on either monaural or binaural analysis alone. We also demonstrate substantial performance gains in terms of multisource localization, particularly for distant sources in reverberant environments and low signal-to-noise ratios. We then develop a binaural system for joint localization and segregation of an unknown and time-varying number of sources that is more flexible and requires less prior information than our initial system. This framework incorporates models trained jointly on pitch and azimuth cues, which improves performance and naturally deals with both voiced and unvoiced speech. Experimental results show that the proposed approach outperforms existing two-microphone systems in spite of less prior information. We also consider how the computational goal of CASA-based segregation should be defined in reverberant environments. The ideal binary mask (IBM) has been established as a main goal of CASA. While the IBM is defined unambiguously in anechoic conditions, in reverberant environments there is some flexibility in how one might define the target signal itself and therefore, ambiguity is introduced to the notion of the IBM. Due to the perceptual distinction between early and late reflections, we introduce the reflection boundary as a parameter to the IBM definition to allow target reflections to be divided into desirable and undesirable components. We conduct a series of intelligibility tests with normal hearing listeners to compare alternative IBM definitions. Results show that it is vital for the IBM definition to account for the energetic effect of early target reflections, and that late target reflections should be characterized as noise.

[1]  W. G. Gardner,et al.  HRTF measurements of a KEMAR , 1995 .

[2]  R. Dye,et al.  The combination of interaural information across frequencies: lateralization on the basis of interaural delay. , 1990, The Journal of the Acoustical Society of America.

[3]  P. Peterson,et al.  Intelligibility-weighted measures of speech-to-interference ratio and speech system performance. , 1993, The Journal of the Acoustical Society of America.

[4]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Marc Moonen,et al.  A QRD-RLS based frequency domain multichannel wiener filter algorithm for noise reduction in hearing aids , 2010, 2010 18th European Signal Processing Conference.

[6]  Leo L. Beranek,et al.  Interaural cross‐correlation, lateral fraction, and low‐ and high‐frequency sound levels as measures of acoustical quality in concert halls , 1995 .

[7]  Parham Aarabi,et al.  Self-localizing dynamic microphone arrays , 2002 .

[8]  S A Shamma,et al.  Stereausis: binaural processing without neural delays. , 1989, The Journal of the Acoustical Society of America.

[9]  Virginia Best,et al.  Binaural interference and auditory grouping. , 2007, The Journal of the Acoustical Society of America.

[10]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[11]  C Trahiotis,et al.  Interference in detection of interaural delay in a sinusoidally amplitude-modulated tone produced by a second, spectrally remote sinusoidally amplitude-modulated tone. , 1995, The Journal of the Acoustical Society of America.

[12]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[13]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[14]  W. Hartmann,et al.  The role of reverberation in release from masking due to spatial separation of sources for speech identification , 2005 .

[15]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[16]  DeLiang Wang,et al.  Sequential organization of speech in computational auditory scene analysis , 2009, Speech Commun..

[17]  Guy J. Brown,et al.  Mask estimation for missing data speech recognition based on statistics of binaural interaction , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  A. Moiseff,et al.  An artificial neural network for sound localization using binaural cues. , 1996, The Journal of the Acoustical Society of America.

[19]  James R. Hopgood,et al.  Time-frequency masking based multiple acoustic sources tracking applying Rao-Blackwellised Monte Carlo data association , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[20]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Tomohiro Nakatani,et al.  Localization by harmonic structure and its application to harmonic sound stream segregation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  John Woodruff,et al.  Active Source Estimation for Improved Source Separation , 2006 .

[23]  E. C. Cherry,et al.  Mechanism of Binaural Fusion in the Hearing of Speech , 1957 .

[24]  Jesper Jensen,et al.  On Optimal Multichannel Mean-Squared Error Estimators for Speech Enhancement , 2009, IEEE Signal Processing Letters.

[25]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[26]  DeLiang Wang,et al.  Reverberant Speech Segregation Based on Multipitch Tracking and Classification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[28]  Ruth Y Litovsky,et al.  The benefit of binaural hearing in a cocktail party: effect of location and type of interferer. , 2004, The Journal of the Acoustical Society of America.

[29]  Walter Kellermann,et al.  TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  DeLiang Wang,et al.  Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[32]  P. N. Denbigh,et al.  A sound segregation algorithm for reverberant conditions , 2001, Speech Commun..

[33]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[34]  Jwu-Sheng Hu,et al.  Location Classification of Nonstationary Sound Sources Using Binaural Room Distribution Patterns , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Daniel P. W. Ellis,et al.  EM Localization and Separation using Interaural Level and Phase Cues , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[36]  C. Darwin,et al.  Lateralization of a perturbed harmonic: effects of onset asynchrony and mistuning. , 1996, The Journal of the Acoustical Society of America.

[37]  H S Colburn,et al.  Lateral-position-based models of interaural discrimination. , 1985, The Journal of the Acoustical Society of America.

[38]  Chaz Yee Toh,et al.  Effects of reverberation on perceptual segregation of competing voices. , 2003, The Journal of the Acoustical Society of America.

[39]  Klaus Diepold,et al.  Robotic binaural localization and separation of more than two concurrent sound sources , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[40]  Volker Hohmann,et al.  Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Franz Pernkopf,et al.  A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Shigeki Sagayama,et al.  Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[43]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[44]  C Trahiotis,et al.  Detectability of interaural delays over select spectral regions: effects of flanking noise. , 1990, The Journal of the Acoustical Society of America.

[45]  Brendan J. Frey,et al.  Robust variational speech separation using fewer microphones than speakers , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46]  R Meddis,et al.  The role of interaural time difference and fundamental frequency difference in the identification of concurrent vowel pairs. , 1992, The Journal of the Acoustical Society of America.

[47]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[49]  Özgür Yõlmaz,et al.  Blind Separation of Speech Mixtures via , 2004 .

[50]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Volker Willert,et al.  A Probabilistic Model for Binaural Sound Localization , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[53]  E. B. Newman,et al.  The precedence effect in sound localization. , 1949, The American journal of psychology.

[54]  H S Colburn,et al.  Test of a model of auditory object formation using intensity and interaural time difference discrimination. , 1992, The Journal of the Acoustical Society of America.

[55]  G. Kidd,et al.  The effect of spatial separation on informational masking of speech in normal-hearing and hearing-impaired listeners. , 2005, The Journal of the Acoustical Society of America.

[56]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[57]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[58]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[59]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..

[60]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[61]  B C Wheeler,et al.  A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers. , 2001, The Journal of the Acoustical Society of America.

[62]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[63]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[64]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[65]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[66]  Stephen E. Levinson,et al.  A Bayes-rule based hierarchical system for binaural sound source localization , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[67]  DeLiang Wang,et al.  An SVM based classification approach to speech separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  David A. Shamma,et al.  MusicStory: An Autonomous, Personalized Music Video Creator , 2007 .

[69]  Hiroshi Sawada,et al.  A Two-Stage Frequency-Domain Blind Source Separation Method for Underdetermined Convolutive Mixtures , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[70]  Roger B. Dannenberg,et al.  Remixing Stereo Music with Score-Informed Source Separation , 2006, ISMIR.

[71]  J. Culling,et al.  Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. , 1995, The Journal of the Acoustical Society of America.

[72]  Richard F. Lyon A computational model of binaural localization and separation , 1983, ICASSP.

[73]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[74]  Marc Moonen,et al.  Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments , 2003, EURASIP J. Adv. Signal Process..

[75]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[76]  Barbara Shinn-Cunningham,et al.  The perceptual consequences of binaural hearing , 2006, International journal of audiology.

[77]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[78]  DeLiang Wang,et al.  Binaural speech segregation based on pitch and azimuth tracking , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[79]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[80]  Ning Ma,et al.  Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments , 2011, INTERSPEECH.

[81]  R. W. Hukin,et al.  Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. , 2000, The Journal of the Acoustical Society of America.

[82]  DeLiang Wang,et al.  Model-based sequential organization in cochannel speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[83]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[84]  H S Colburn,et al.  The precedence effect. , 1999, The Journal of the Acoustical Society of America.

[85]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[86]  R. W. Hukin,et al.  Comparison of the effect of onset asynchrony on auditory grouping in pitch matching and vowel identification , 1995, Perception & psychophysics.

[87]  Franz Pernkopf,et al.  Joint Position-Pitch Tracking for 2-Channel Audio , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[88]  Douglas L. Jones,et al.  Localization-based grouping , 2006 .

[89]  Bhaskar D. Rao,et al.  Separation and tracking of multiple speakers in a reverberant environment using a multiple model particle filter glimpsing method , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[90]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[91]  Jacob Benesty,et al.  Performance of GCC- and AMDF-Based Time-Delay Estimation in Practical Reverberant Environments , 2005, EURASIP J. Adv. Signal Process..

[92]  Daniel P. W. Ellis,et al.  Multi-channel source separation by factorial HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[93]  DeLiang Wang,et al.  Multitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers. , 2009, The Journal of the Acoustical Society of America.

[94]  Daniel P. W. Ellis,et al.  Combining localization cues and source model constraints for binaural source separation , 2011, Speech Commun..

[95]  Trevor Darrell,et al.  Learning a Precedence Effect-Like Weighting Function for the Generalized Cross-Correlation Framework , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[96]  DeLiang Wang,et al.  Binaural segregation in multisource reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[97]  Klaus Diepold,et al.  A New Method for Binaural 3-D Localization Based on Hrtfs , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[98]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[99]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[100]  S. T. Goverts,et al.  Measuring the effects of reverberation and noise on sentence intelligibility for hearing-impaired listeners. , 2010, Journal of speech, language, and hearing research : JSLHR.

[101]  Yiu-Tong Chan,et al.  Constrained adaptation for time delay estimation with multipath propagation , 1991 .

[102]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[103]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[104]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[105]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[106]  Martin Cooke,et al.  Binaural Estimation of Sound Source Distance via the Direct-to-Reverberant Energy Ratio for Static and Moving Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[107]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[108]  DeLiang Wang,et al.  HMM-Based Multipitch Tracking for Noisy and Reverberant Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[109]  R. W. Hukin,et al.  Perceptual segregation of a harmonic from a vowel by interaural time difference and frequency proximity. , 1997, The Journal of the Acoustical Society of America.

[110]  H S Colburn,et al.  Theory of binaural interaction based on auditory-nerve data. II. Detection of tones in noise. , 1977, The Journal of the Acoustical Society of America.

[111]  Volker Hohmann,et al.  Auditory model based direction estimation of concurrent speakers from binaural signals , 2011, Speech Commun..

[112]  Benoît Champagne,et al.  A new cepstral prefiltering technique for estimating time delay under reverberant conditions , 1997, Signal Process..

[113]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[114]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[115]  DeLiang Wang,et al.  Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[116]  Tai-Shih Chi,et al.  A binaural algorithm for space and pitch detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[117]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[118]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[119]  Kevin Wilson,et al.  Speech Source Separation by Combining Localization Cues with Mixture Models of Speech Spectra , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[120]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[121]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[122]  Ning Ma,et al.  A speech fragment approach to localising multiple speakers in reverberant environments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[123]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[124]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[125]  Guy J. Brown,et al.  Missing data speech recognition in reverberant conditions , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[126]  Stuart Gatehouse,et al.  Perceptual segregation of competing speech sounds: the role of spatial location. , 1999, The Journal of the Acoustical Society of America.

[127]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[128]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[129]  A. Kohlrausch,et al.  Binaural processing model based on contralateral inhibition. I. Model structure. , 2001, The Journal of the Acoustical Society of America.

[130]  DeLiang Wang,et al.  Binaural Sound Localization , 2006 .

[131]  Volker Hohmann,et al.  Sound source localization in real sound fields based on empirical statistics of interaural parameters. , 2006, The Journal of the Acoustical Society of America.

[132]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[133]  Pak-Chung Ching,et al.  Joint time delay and pitch estimation for speaker localization , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[134]  R. Duda,et al.  Range dependence of the response of a spherical head model , 1998 .

[135]  Charles Darwin,et al.  Spatial Hearing and Perceiving Sources , 2008 .

[136]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[137]  P. N. Denbigh,et al.  Pitch extraction and separation of overlapping speech , 1991, Speech Commun..

[138]  Marc Moonen,et al.  Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction , 2007, Speech Commun..

[139]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[140]  M S Brandstein Time-delay estimation of reverberated speech exploiting harmonic structure. , 1999, The Journal of the Acoustical Society of America.

[141]  Nicoleta Roman,et al.  Intelligibility of reverberant noisy speech with ideal binary masking. , 2011, The Journal of the Acoustical Society of America.

[142]  Walter Kellermann,et al.  Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[143]  Douglas E. Sturim,et al.  Tracking multiple talkers using microphone-array measurements , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[144]  Walter Kellermann,et al.  Blind Source Separation for Convolutive Mixtures: A Unified Treatment , 2004 .

[145]  DeLiang Wang,et al.  Combining monaural and binaural evidence for reverberant speech segregation , 2010, INTERSPEECH.

[146]  R W Hukin,et al.  Perceptual segregation of a harmonic from a vowel by interaural time difference in conjunction with mistuning and onset asynchrony. , 1998, The Journal of the Acoustical Society of America.

[147]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[148]  Volker Hohmann,et al.  Computational scene analysis of cocktail-party situations based on sequential Monte Carlo methods , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[149]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[150]  Harald Romsdorfer,et al.  COMPARISON OF SRP-PHAT AND MULTIBAND-POPI ALGORITHMS FOR SPEAKER LOCALIZATION USING PARTICLE FILTERS , 2010 .

[151]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[152]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[153]  DeLiang Wang,et al.  Two-Microphone Separation of Speech Mixtures , 2008, IEEE Transactions on Neural Networks.

[154]  DeLiang Wang,et al.  An approach to sequential grouping in cochannel speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[155]  Guy J. Brown,et al.  Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation , 2007, MLMI.

[156]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[157]  A. Nabelek,et al.  Monaural and binaural speech perception in reverberation for listeners of various ages. , 1982, The Journal of the Acoustical Society of America.

[158]  Birger Kollmeier,et al.  A simple architecture for using multiple cues in sound separation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[159]  J.P.A. Lochner,et al.  The influence of reflections on auditorium acoustics , 1964 .

[160]  Ba-Ngu Vo,et al.  Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach , 2006, IEEE Transactions on Signal Processing.

[161]  Ray Meddis,et al.  The Role of Binaural and Fundamental Frequency Difference cues in the Identification of Concurrently Presented Vowels , 1994 .

[162]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[163]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[164]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[165]  R. W. Hukin,et al.  Auditory objects of attention: the role of interaural time differences. , 1999, Journal of experimental psychology. Human perception and performance.

[166]  D. Markle,et al.  Hearing Aids , 1936, The Journal of Laryngology & Otology.

[167]  Walter Kellermann,et al.  Analysis of two generic Wiener filtering concepts for binaural speech enhancement in hearing aids , 2010, 2010 18th European Signal Processing Conference.

[168]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[169]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[170]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[171]  Hiroshi Sawada,et al.  Blind Separation of More Speech than Sensors with Less Distortion by Combining Sparseness and ICA , 2003 .

[172]  Roberto Cusani,et al.  Performance of fast time delay estimators , 1989, IEEE Trans. Acoust. Speech Signal Process..

[173]  DeLiang Wang,et al.  Directionality-based speech enhancement for hearing aids , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[174]  B. Widrow,et al.  Adaptive antenna systems , 1967 .

[175]  DeLiang Wang,et al.  Integrating monaural and binaural analysis for localizing multiple reverberant sound sources , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[176]  Irving S. Reed,et al.  Equivalence of the Likelihood Ratio Processor, the Maximum Signal-to-Noise Ratio Filter, and the Wiener Filter , 1972, IEEE Transactions on Aerospace and Electronic Systems.

[177]  DeLiang Wang,et al.  Unvoiced Speech Segregation From Nonspeech Interference via CASA and Spectral Subtraction , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[178]  Jacob Benesty,et al.  On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[179]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[180]  Tim Brookes,et al.  Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[181]  Bryan Pardo,et al.  Using Pitch, Amplitude Modulation, and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings , 2007, EURASIP J. Adv. Signal Process..

[182]  Ronald P. S. Mahler,et al.  Statistical Multisource-Multitarget Information Fusion , 2007 .

[183]  C. M. Marin,et al.  Concurrent vowel identification II: Effects of phase, harmonicity and task , 1997 .

[184]  DeLiang Wang,et al.  On the role of localization cues in binaural segregation of reverberant speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[185]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[186]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[187]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[188]  N. Durlach Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .

[189]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[190]  Søren Holdt Jensen,et al.  Joint DOA and fundamental frequency estimation methods based on 2-D filtering , 2010, 2010 18th European Signal Processing Conference.

[191]  H. Colburn,et al.  Models of Sound Localization , 2005 .

[192]  Richard M. Stern,et al.  Speech recognizer-based microphone array processing for robust hands-free speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[193]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[194]  DeLiang Wang,et al.  A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[195]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[196]  Carol Y. Espy-Wilson,et al.  An algorithm for speech segregation of co-channel speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[197]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[198]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[199]  R L Freyman,et al.  Spatial release from informational masking in speech recognition. , 2001, The Journal of the Acoustical Society of America.

[200]  R. Patterson,et al.  B OF THE SVOS FINAL REPORT ( Part A : The Auditory Filterbank ) AN EFFICIENT AUDITORY FIL TERBANK BASED ON THE GAMMATONE FUNCTION , 2010 .