A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

Humans have the ability to easily separate a composed speech and to form perceptual representations of the constituent sources in an acoustic mixture thanks to their ears. Until recently, researchers attempt to build computer models of high-level functions of the auditory system. The problem of the composed speech segregation is still a very challenging problem for these researchers. In our case, we are interested in approaches that are addressed to the monaural speech segregation. For this purpose, we study in this paper the computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. CASA is the reproduction of the source organization achieved by listeners. It is based on two main stages: segmentation and grouping. In this work, we have presented, and compared several studies that have used CASA for speech separation and recognition.

[1]  DeLiang Wang,et al.  Robust speaker identification using a CASA front-end , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[3]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[4]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[5]  Yi Jiang,et al.  Auditory Feature for Monaural Speech Segregation , 2014 .

[6]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[7]  DeLiang Wang,et al.  A computational auditory scene analysis system for speech segregation and robust speech recognition , 2010, Comput. Speech Lang..

[8]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[9]  Gerhard Schmidt,et al.  Topics in Acoustic Echo and Noise Control , 2006 .

[10]  Ruey-Wen Liu,et al.  General approach to blind source separation , 1996, IEEE Trans. Signal Process..

[11]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[12]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[13]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[14]  Richard M. Dansereau,et al.  MPtracker: A new multi-pitch detection and separation algorithm for mixed speech signals , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Peng Li,et al.  Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  DeLiang Wang,et al.  Unvoiced Speech Segregation From Nonspeech Interference via CASA and Spectral Subtraction , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Bo Xu,et al.  Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function , 2010, EURASIP J. Audio Speech Music. Process..

[18]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[19]  Guy J. Brown,et al.  Separation of Speech by Computational Auditory Scene Analysis , 2005 .

[20]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[21]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  DeLiang Wang,et al.  An Auditory Scene Analysis Approach to Monaural Speech Segregation , 2006 .

[23]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Peng Li,et al.  Monaural voiced speech segregation based on elaborate harmonic grouping strategies , 2011, Science China Information Sciences.

[25]  Bo Xu,et al.  Monaural voiced speech segregation based on elaborate harmonic grouping strategy , 2009, ICASSP.

[26]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[27]  Wenju Liu,et al.  Monaural Voiced Speech Segregation Based on Pitch and Comb Filter , 2011, INTERSPEECH.