Unsupervised sequential organization for cochannel speech separation

The problem of sequential organization in the cochannel speech situation has previously been studied using speaker-model based methods. A major limitation of these methods is that they require the availability of pretrained speaker models and prior knowledge (or detection) of participating speakers. We propose an unsupervised clustering approach to cochannel speech sequential organization. Given enhanced cepstral features, we search for the optimal assignment of simultaneous speech streams by maximizing the betweenand within-cluster scatter matrix ratio penalized by concurrent pitches within individual speakers. A genetic algorithm is employed to speed up the search. Our method does not require trained speaker models, and experiments with both ideal and estimated simultaneous streams show the proposed method outperforms a speakermodel based method in both speech segregation and computational efficiency.

[1]  James E. Baker,et al.  Adaptive Selection Methods for Genetic Algorithms , 1985, International Conference on Genetic Algorithms.

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  Bhiksha Raj,et al.  Soft Mask Methods for Single-Channel Speaker Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Richard M. Stern,et al.  Voting for two speaker segmentation , 2006, INTERSPEECH.

[5]  DeLiang Wang,et al.  Sequential organization in computational auditory scene analysis , 2007 .

[6]  DeLiang Wang,et al.  Model-based sequential organization in cochannel speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[8]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[9]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  DeLiang Wang,et al.  Sequential organization of speech in computational auditory scene analysis , 2009, Speech Commun..

[11]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[12]  S.J. Wenndt,et al.  Unsupervised Indexing of Conversations with Short Speaker Utterances , 2007, 2007 IEEE Aerospace Conference.

[13]  Daniel P. W. Ellis,et al.  Speech separation using speaker-adapted eigenvoice speech models , 2010, Comput. Speech Lang..

[14]  Ning Ma,et al.  Recent advances in speech fragment decoding techniques , 2006, INTERSPEECH.

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .