Offline speaker segmentation using genetic algorithms and mutual information

We present an evolutionary approach to speaker segmentation, an activity that is especially important prior to speaker recognition and audio content analysis tasks. Our approach consists of a genetic algorithm (GA), which encodes possible segmentations of an audio record, and a measure of mutual information between the audio data and possible segmentations, which is used as fitness function for the GA. We introduce a compact encoding of the problem into the GA which reduces the length of the GA individuals and improves the GA convergence properties. Our algorithm has been tested on the segmentation of real audio data, and its performance has been compared with several existing algorithms for speaker segmentation, obtaining very good results in all test problems.

[1]  Iain McCowan,et al.  Location based speaker segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Ke Chen,et al.  Towards better making a decision in speaker verification , 2003, Pattern Recognit..

[3]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[4]  Sancho Salcedo Sanz,et al.  Symbol decision via genetic optimization of mutual information , 2003 .

[5]  C.-Y. Lee,et al.  Variable Length Genomes for Evolutionary Algorithms , 2000, GECCO.

[6]  Takeshi Mizuike,et al.  Optimization of frequency assignment , 1989, IEEE Trans. Commun..

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  George W. Irwin,et al.  Fuzzy coding of genetic algorithms , 2003, IEEE Trans. Evol. Comput..

[10]  Chang Wook Ahn,et al.  Elitism-based compact genetic algorithms , 2003, IEEE Trans. Evol. Comput..

[11]  DeLiang Wang,et al.  A dynamically coupled neural oscillator network for image segmentation , 2002, Neural Networks.

[12]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[13]  Hervé Bourlard,et al.  Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[14]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[15]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[16]  Suresh Manandhar,et al.  Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming , 2001, Machine Learning.

[17]  Francine Chen,et al.  Segmentation of speech using speaker identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[19]  Pawan Lingras,et al.  Unsupervised Rough Set Classification Using GAs , 2001, Journal of Intelligent Information Systems.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Hynek Hermansky,et al.  A new speaker change detection method for two-speaker segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Bayan S. Sharif,et al.  An improved resource allocation scheme for plane cover multiple access using genetic algorithm , 2005, IEEE Transactions on Evolutionary Computation.

[23]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[24]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[25]  Masafumi Nishida,et al.  Speaker indexing for news articles, debates and drama in broadcasted TV programs , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[26]  Gregory Levitin,et al.  A Genetic Algorithm with a Compact Solution Encoding for the Container Ship Stowage Problem , 2002, J. Heuristics.

[27]  Sancho Salcedo-Sanz,et al.  A Hybrid Neural-Genetic Algorithm for the Frequency Assignment Problem in Satellite Communications , 2005, Applied Intelligence.

[28]  G. W. Snedecor Statistical Methods , 1964 .

[29]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[30]  Daniel P. W. Ellis,et al.  Using acoustic condition clustering to improve acoustic change detection on broadcast news , 2000, INTERSPEECH.

[31]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[32]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .