A novel fuzzy clustering algorithm using observation weighting and context information for reverberant blind speech separation

Time-frequency masking has evolved as a powerful tool for tackling blind source separation problems. In previous work, mask estimation was performed with the help of well-known standard cluster algorithms. Spatial observation vectors, extracted from a set of microphones, were grouped into separate clusters, each representing a particular source. However, most off-the-shelf clustering methods are not very robust to outliers or noise in the data. This lack of robustness often leads to incorrect localization and partitioning results, particularly for reverberant speech mixtures. To address this issue, we investigate the use of observation weights and context information as means to improve the clustering performance under reverberant conditions. While the observation weights improve the localization accuracy by ignoring noisy observations, context information smoothes the cluster membership levels by exploiting the highly structured nature of speech signals in the time-frequency domain. In a number of experiments, we demonstrate the superiority of the proposed method over conventional fuzzy clustering, both in terms of localization accuracy as well as speech separation performance.

[1]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[2]  M. Lefebvre Applied probability and statistics , 2006 .

[3]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hiroshi Sawada,et al.  BLIND SPEECH SEPARATION BY COMBINING BEAMFORMERS AND A TIME FREQUENCY BINARY MASK , 2006 .

[5]  R. M. Mersereau,et al.  Regularization parameter estimation for iterative image restoration in a weighted Hilbert space , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  DeLiang Wang,et al.  Musical Sound Separation Using Pitch-Based Labeling and Binary Time-Frequency Masking , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[8]  Hiroshi Sawada,et al.  Doa Estimation for Multiple Sparse Sources with Normalized Observation Vector Clustering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[10]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[11]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[12]  Alan Wee-Chung Liew,et al.  Fuzzy image clustering incorporating spatial continuity , 2000 .

[13]  Parham Aarabi,et al.  Robust sound localization using conditional time-frequency histograms , 2003, Inf. Fusion.

[14]  Sadaaki Miyamoto,et al.  Possibilistic and Fuzzy c-Means Clustering with Weighted Objects , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[15]  Hiroshi Sawada,et al.  Normalized observation vector clustering approach for sparse source separation , 2006, 2006 14th European Signal Processing Conference.

[16]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[17]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[18]  Giuliano Benelli,et al.  Detail-preserving segmentation of polarimetric SAR imagery , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.

[19]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[20]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[21]  Stan Z. Li Markov Random Field Modeling in Image Analysis , 2009, Advances in Pattern Recognition.

[22]  Tzong-Jer Chen,et al.  Fuzzy c-means clustering with spatial information for image segmentation , 2006, Comput. Medical Imaging Graph..

[23]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[24]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25]  Gérard Govaert,et al.  Clustering of Spatial Data by the EM Algorithm , 1997 .

[26]  Barak A. Pearlmutter,et al.  Soft-LOST: EM on a Mixture of Oriented Lines , 2004, ICA.

[27]  Stanley J. Reeves,et al.  A cross-validation framework for solving image restoration problems , 1992, J. Vis. Commun. Image Represent..

[28]  Sven Nordholm,et al.  Robust Source Localization in Reverberant Environments Based on Weighted Fuzzy Clustering , 2009, IEEE Signal Processing Letters.

[29]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[30]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Scott T. Rickard,et al.  Underdetermined Blind Source Separation in Echoic Environments Using DESPRIT , 2007, EURASIP J. Adv. Signal Process..

[32]  Hiroshi Sawada,et al.  Blind Source Separation Based on a Beamformer Array and Time Frequency Binary Masking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[33]  Rhee Man Kil,et al.  Zero-Crossing Based Time-Frequency Masking for Sound Segregation , 2006 .

[34]  Mike E. Davies,et al.  Under-determined speech separation using GMM-based non-linear beamforming , 2008, 2008 16th European Signal Processing Conference.

[35]  G M Clark,et al.  Evaluation of a portable two-microphone adaptive beamforming speech processor with cochlear implant patients. , 1995, The Journal of the Acoustical Society of America.

[36]  John C. Russ,et al.  The Image Processing Handbook , 2016, Microscopy and Microanalysis.

[37]  Thomas Hofmann,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2007 .

[38]  Masahito Togami,et al.  Stepwise Phase Difference Restoration Method for Sound Source Localization using Multiple Microphone Pairs , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[39]  Paul R. Kersten,et al.  Fuzzy order statistics and their application to fuzzy clustering , 1999, IEEE Trans. Fuzzy Syst..

[40]  Hiroshi Sawada,et al.  A Two-Stage Frequency-Domain Blind Source Separation Method for Underdetermined Convolutive Mixtures , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[41]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yannick Deville,et al.  A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources , 2005, Signal Process..

[43]  Hiroshi Sawada,et al.  BLIND SPARSE SOURCE SEPARATION WITH SPATIALLY SMOOTHED TIME-FREQUENCY MASKING 1 , 2006 .

[44]  Jie Huang,et al.  Sound localization in reverberant environment based on the model of the precedence effect , 1997 .

[45]  H S Colburn,et al.  The precedence effect. , 1999, The Journal of the Acoustical Society of America.

[46]  Özgür Yõlmaz,et al.  Blind Separation of Speech Mixtures via , 2004 .

[47]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[48]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[49]  Daniel P. W. Ellis,et al.  Source separation based on binaural cues and source model constraints , 2008, INTERSPEECH.

[50]  Song-can Chen,et al.  Kernel-based fuzzy and possibilistic c-means clustering , 2003 .

[51]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[52]  Hiroshi Sawada,et al.  Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[53]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing , 2000 .

[54]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[55]  Dzung L. Pham,et al.  Spatial Models for Fuzzy Clustering , 2001, Comput. Vis. Image Underst..

[56]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[57]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[58]  Shubha Kadambe,et al.  A probabilistic approach for blind source separation of underdetermined convolutive mixtures , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[59]  Dorothea Kolossa,et al.  Nonlinear Postprocessing for Blind Speech Separation , 2004, ICA.

[60]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[61]  M. Weiss,et al.  Use of an adaptive noise canceler as an input preprocessor for a hearing aid. , 1987, Journal of rehabilitation research and development.

[62]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[63]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[64]  Simon J. Godsill,et al.  Blind Separation of Sparse Sources Using Jeffrey's Inverse Prior and the EM Algorithm , 2006, ICA.

[65]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[66]  Rongchun Zhao,et al.  Image segmentation by clustering of spatial patterns , 2007, Pattern Recognit. Lett..

[67]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.