Monaural voiced speech segregation based on elaborate harmonic grouping strategy

Monaural speech segregation is a very challenging problem which has been studied by many researchers. In this paper, we focus on voiced speech segregation. Different strategies are used to segregate resolved and unresolved harmonics respectively. For resolved harmonics, “harmonicity” principle and a novel mechanism based on “minimum amplitude” principle are employed. Amplitude modulation rate is extracted by “enhanced” autocorrelation function of envelope to segregate unresolved harmonics which is more robust than previous method. An elaborate rule is also introduced to determine the regions dominated by resolved and by unresolved harmonics. Proposed algorithm is evaluated on Cooke's 100 mixtures and compared with a state-of-the-art algorithm Hu and Wang model. Results show that proposed algorithm is more robust than the Hu and Wang model.

[1]  R. Carlyon,et al.  Comparing the fundamental frequencies of resolved and unresolved harmonics: Evidence for two pitch mechanisms? , 1994 .

[2]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Kohlrausch,et al.  The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers , 2000, The Journal of the Acoustical Society of America.

[4]  E. de Boer,et al.  On cochlear encoding: Potentialities and limitations of the reverse‐correlation technique , 1978 .

[5]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[6]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[7]  Zbynek Koldovsky,et al.  Time-Domain Blind Separation of Audio Sources on the Basis of a Complete ICA Decomposition of an Observation Space , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[9]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[10]  Nobuhiko Kitawaki,et al.  Combined approach of array processing and independent component analysis for blind separation of acoustic signals , 2003, IEEE Trans. Speech Audio Process..

[11]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[12]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[13]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[14]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[15]  Peng Li,et al.  Monaural speech separation based on MAXVQ and CASA for robust speech recognition , 2010, Comput. Speech Lang..

[16]  Hermann von Helmholtz,et al.  On the Sensations of Tone , 1954 .

[17]  Anssi Klapuri,et al.  Auditory-Model Based Methods for Multiple Fundamental Frequency Estimation , 2006 .

[18]  A. M. Mimpen,et al.  The ear as a frequency analyzer. II. , 1964, The Journal of the Acoustical Society of America.

[19]  Albert S. Bregman,et al.  Auditory Scene Analysis , 2001 .

[20]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[21]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[22]  R Meddis,et al.  Simulation of auditory-neural transduction: further studies. , 1988, The Journal of the Acoustical Society of America.