An approach to sequential grouping in cochannel speech

Model-based methods for sequential organization in cochannel speech require pretrained speaker models and often prior knowledge of participating speakers. We propose an unsupervised approach to sequential organization of cochannel speech. Based on cepstral features, we first cluster voiced speech into two speaker groups by maximizing the ratio of between- and within-group distances penalized by within-group concurrent pitches. To group unvoiced speech, we employ an onset/offset based analysis to generate time-frequency segments. Unvoiced segments are then labeled by the complementary portions of segregated voiced speech. Our method does not require any pretrained model and is computationally simple. Evaluations and comparisons show that the proposed method outperforms a model-based method in terms of speech segregation.

[1]  DeLiang Wang,et al.  Sequential organization in computational auditory scene analysis , 2007 .

[2]  Bhiksha Raj,et al.  Soft Mask Methods for Single-Channel Speaker Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  DeLiang Wang,et al.  Model-based sequential organization in cochannel speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  DeLiang Wang,et al.  A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Richard M. Stern,et al.  Voting for two speaker segmentation , 2006, INTERSPEECH.

[6]  Ning Ma,et al.  Recent advances in speech fragment decoding techniques , 2006, INTERSPEECH.

[7]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[8]  Daniel P. W. Ellis,et al.  Speech separation using speaker-adapted eigenvoice speech models , 2010, Comput. Speech Lang..

[9]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[10]  DeLiang Wang,et al.  A computational auditory scene analysis system for speech segregation and robust speech recognition , 2010, Comput. Speech Lang..

[11]  DeLiang Wang,et al.  Sequential organization of speech in computational auditory scene analysis , 2009, Speech Commun..

[12]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[13]  DeLiang Wang,et al.  Unsupervised sequential organization for cochannel speech separation , 2010, INTERSPEECH.

[14]  S.J. Wenndt,et al.  Unsupervised Indexing of Conversations with Short Speaker Utterances , 2007, 2007 IEEE Aerospace Conference.

[15]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[16]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.