An integration of source location cues for speech clustering in distributed microphone arrays

We propose a new approach for clustering competing speech sources using distributed microphone arrays. In this approach, we first define two feature vectors where the first captures the intra-node location information while the second captures the level difference of speech energy recorded at different nodes. Then, we introduce Watson and Dirichlet mixture models to model the first and second features, respectively. We integrate both types of information in an expectation maximization algorithm to cluster the simultaneous speech sources. The performance of the proposed approach is superior to best node selection and comparable to centralized processing in terms of conventional blind source separation metrics.

[1]  Nizar Bouguila,et al.  Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application , 2004, IEEE Transactions on Image Processing.

[2]  I. M. Pyshik,et al.  Table of integrals, series, and products , 1965 .

[3]  Francesco Nesta,et al.  Cooperative Wiener-ICA for source localization and Separation by distributed microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yusuke Hioka,et al.  Distributed blind source separation with an application to audio signals , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  T. Minka Estimating a Dirichlet distribution , 2012 .

[6]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Barak A. Pearlmutter,et al.  Soft-LOST: EM on a Mixture of Oriented Lines , 2004, ICA.

[8]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[10]  Reinhold Häb-Umbach,et al.  Blind speech separation employing directional statistics in an Expectation Maximization framework , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Markus Breitenbach,et al.  Mixture of Watson Distributions: A Generative Model for Hyperspherical Embeddings , 2007, AISTATS.

[12]  K. Mardia,et al.  The complex Watson distribution and shape analysis , 1999 .

[13]  Suvrit Sra,et al.  The multivariate Watson distribution: Maximum-likelihood estimation and other aspects , 2011, J. Multivar. Anal..

[14]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[15]  Marc Moonen,et al.  Distributed Adaptive Estimation of Node-Specific Signals in Wireless Sensor Networks With a Tree Topology , 2011, IEEE Transactions on Signal Processing.

[16]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[18]  Dongbing Gu,et al.  Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks , 2008, IEEE Transactions on Neural Networks.

[19]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[20]  Barak A. Pearlmutter,et al.  The LOST Algorithm: Finding Lines and Separating Speech Mixtures , 2008, EURASIP J. Adv. Signal Process..

[21]  Marc Moonen,et al.  Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yutaka Kaneda,et al.  Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones , 2001 .

[23]  Alexander Bertrand,et al.  Applications and trends in wireless acoustic sensor networks: A signal processing perspective , 2011, 2011 18th IEEE Symposium on Communications and Vehicular Technology in the Benelux (SCVT).

[24]  Tomohiro Nakatani,et al.  Distributed microphone array processing for speech source separation with classifier fusion , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.