Perceptually motivated blind source separation of convolutive audio mixtures with subspace filtering methods

Blind source separation aims to recover independent sources from their multiple observed mixtures using independent component analysis (ICA). However, when applying this technique to audio mixture problem such as a number of people talking in a room, the performance of the system is greatly reduced by the effect of room reflections and ambient noise. In contrast, two human microphones (the ears) perform well in such a real cock-tail party environment. Over the last few years the application of psycho-acoustic principles (the human auditory perception) has led to the successful development of MPEG audio coding standard which is the basic technique behind MP3 players and music availability on the internet. The first objective of this thesis is to apply psycho-acoustic principles to the spatial processing of speech signals in noisy and reverberant environment. The key assumption that will be adopted is that modern signal processing has failed to mimic the cock-tail party effect because there has been no attempt to adequately incorporate the psycho acoustical phenomenon of audio masking to aid source separation. A quasi linear mechanism for mimicking simultaneous frequency masking and temporal masking (post masking) techniques is developed. This framework is used to construct blind source separation algorithms that exploit audio masking prior to source separation (preprocessor) and after source separation (postprocessor). The final objective of this thesis is to exploit the perceptual irrelevancy of some of the input speech spectrum using the perceptual masking techniques before utilising the subspace method as a preprocessor of the frequency-domain ICA (FDICA) which reduces the effect of room reflections in advance and the remaining direct sounds then being separated by ICA. Incorporating the perceptual masking techniques prior to the application of MICA with the subspace method as preprocessor not only reduces the computational complexity of similarity measure for solving the permutations but also avoids the so-called permutation problem by targeting a specific speech signal more intelligible than the available microphone signals. Experiments carried out in both synthetic and real room scenarios and the results shown good objective performance in terms of signal-to-interference ratio (SW) and enhanced modified Bark spectral distortion (EMBSD) confirm the validity of the proposed solutions. Dedicated To My Parents

[1]  D. Pham,et al.  Exploiting source non stationary and coloration in blind source separation , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[2]  Ehud Weinstein,et al.  Multichannel signal separation: methods and analysis , 1996, IEEE Trans. Signal Process..

[3]  Saeid Sanei,et al.  Penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources , 2005, IEEE Transactions on Signal Processing.

[4]  Jose C. Principe,et al.  Simultaneous Diagonalization in the Frequency Domain (SDIF) for Source Separation , 2000 .

[5]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[6]  James R. Hopgood Nonstationary signal processing with application to reverberation cancellation in acoustic environments , 2000 .

[7]  Hiroshi Sawada,et al.  Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[8]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[9]  Saeid Sanei,et al.  A Novel Hybrid Approach to the Permutation Problem of Frequency Domain Blind Source Separation , 2004, ICA.

[10]  David Laurenson,et al.  The effect of signal non-stationarity on the performance of information-maximisation-based blind separation , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[11]  Kari Torkkola,et al.  Blind Separation For Audio Signals - Are We There Yet? , 1999 .

[12]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[13]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[14]  David Laurenson,et al.  Improvements in the on-line performance of information-maximisation-based blind signal separation , 1999 .

[15]  Saeid Sanei,et al.  Penalty Function Approach for Constrained Convolutive Blind Source Separation , 2004, ICA.

[16]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[17]  Dinh-Tuan Pham,et al.  Blind separation of speech mixtures based on nonstationarity , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[18]  James P. Reilly,et al.  Blind source separation of convolved sources by joint approximate diagonalization of cross-spectral density matrices , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[20]  Kiyohiro Shikano,et al.  Fast-convergence algorithm for ICA-based blind source separation using array signal processing , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[21]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[23]  Allan Kardec Barros,et al.  SPEECH EXTRACTION FROM INTERFERENCES IN REAL ENVIRONMENT USING BANK OF FILTERS AND BLIND SOURCE SEPARATION , 2000 .

[24]  Allan Kardec Barros,et al.  Enhancement of a Speech Signal Embedded in Noisy Environment Using Two Microphones , 2000 .

[25]  Kiyohiro Shikano,et al.  Fast-Convergence Algorithm for Blind Source Separation Based on Array Signal Processing , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[26]  R. Lambert Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures , 1996 .

[27]  Shoko Araki,et al.  The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech , 2003, IEEE Trans. Speech Audio Process..

[28]  Christian Jutten,et al.  Blind source separation for convolutive mixtures , 1995, Signal Process..

[29]  T.F. Quatieri,et al.  A perceptual representation of audio for co-channel source separation , 1991, Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  Noboru Murata,et al.  An Approach to Blind Source Separation of Speech Signals , 1998 .

[31]  Lang Tong,et al.  Indeterminacy and identifiability of blind identification , 1991 .

[32]  Bernard Mulgrew,et al.  Perceptually motivated blind source separation of convolutive mixtures , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[33]  Dennis R. Morgan,et al.  A multiresolution approach to blind separation of speech signals in a reverberant environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[34]  Daniel P. W. Ellis The weft: a representation for periodic sounds , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Marcus J. T. Alphey,et al.  Blind source separation : the effects of signal non-stationarity , 2002 .

[36]  James P. Reilly,et al.  A new fast-converging method for blind source separation of speech signals in acoustic environments , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[37]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[38]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[39]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[40]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[41]  Daniel P. W. Ellis,et al.  Hierarchic models of hearing for sound separation and reconstruction , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[42]  W. Penn Fundamentals of hearing-aid design , 1944, Electrical Engineering.

[43]  Jukka Saarinen,et al.  Perceptual irrelevancy removal in narrowband speech coding , 2003, INTERSPEECH.

[44]  Nikolaos Mitianoudis,et al.  A fixed point solution for convolved audio source separation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[45]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[46]  MOTIVATED BLIND SOURCE SEPARATION OF CONVOLUTIVE AUDIO MIXTURES WITH SUBSPACE FILTERING METHOD , 2005 .

[47]  Allan Kardec Barros,et al.  Estimation of speech embedded in a reverberant environment with multiple sources of noise , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[48]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[49]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[50]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[51]  Kiyohiro Shikano,et al.  MULTISTAGE ICA FOR BLIND SOURCE SEPARATION OF REAL ACOUSTIC CONVOLUTIVE MIXTURE , 2003 .

[52]  Jialong He,et al.  On the use of orthogonal GMM in speaker recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[53]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[54]  Wonho Yang,et al.  Enhanced modified bark spectral distortion (embsd): an objective speech quality measure based on audible distortion and cognition model , 1999 .

[55]  Futoshi Asano,et al.  EVALUATION AND REAL-TIME IMPLEMENTATION OF BLIND SOURCE SEPARATION SYSTEM USING TIME-DELAYED DECORRELATION , 2000 .

[56]  Yi Zhou,et al.  Blind source separation in frequency domain , 2003, Signal Process..

[57]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[58]  Allan Kardec Barros,et al.  SPEECH ENHANCEMENT FROM INTERFERING SOUNDS USING CASA TECHNIQUES AND BLIND SOURCE SEPARATION , 2001 .

[59]  K. Riedel Numerical Bayesian Methods Applied to Signal Processing , 1996 .

[60]  V. Michael Bove,et al.  Blind Separation Of Real World Audio Signals Using Overdetermined Mixtures , 1999 .

[61]  Hiroshi Saruwatari,et al.  Blind source separation based on fast-convergence algorithm using ICA and beamforming for real convolutive mixture , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Kiyohiro Shikano,et al.  Comparison of time-domain ICA, frequency-domain ICA and multistage ICA for blind source separation , 2002, 2002 11th European Signal Processing Conference.

[63]  Satoshi Nakamura,et al.  Speech enhancement based on the subspace method , 2000, IEEE Trans. Speech Audio Process..

[64]  Daniel P. W. Ellis,et al.  Underconstrained stochastic representations for top-down computational auditory scene analysis , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[65]  Robert E. Yantorno,et al.  Performance of the modified Bark spectral distortion as an objective speech quality measure , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[66]  Dennis R. Morgan,et al.  Permutation inconsistency in blind speech separation: investigation and solutions , 2005, IEEE Transactions on Speech and Audio Processing.

[67]  Allan Kardec Barros,et al.  Extraction of Specific Signals with Temporal Structure , 2001, Neural Computation.

[68]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[69]  P. Noll,et al.  Wideband speech and audio coding , 1993, IEEE Communications Magazine.

[70]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[71]  Paris Smaragdis,et al.  Evaluation of blind signal separation methods , 1999 .

[72]  Nobuhiko Kitawaki,et al.  Combined approach of array processing and independent component analysis for blind separation of acoustic signals , 2003, IEEE Trans. Speech Audio Process..

[73]  Nikolaos Mitianoudis,et al.  New fixed-point solutions for convolved mixtures , 2001 .

[74]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[75]  R. Gribonval,et al.  Proposals for Performance Measurement in Source Separation , 2003 .

[76]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[77]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[78]  Shiro Ikeda,et al.  A METHOD OF ICA IN TIME-FREQUENCY DOMAIN , 2003 .

[79]  Henrik Sahlin,et al.  Statistical analysis of a signal separation method based on second order statistics , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).