Unknown-multiple signal source clustering problem using ergodic HMM and applied to speaker classification
暂无分享,去创建一个
The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. The report describes a resolution method that is based on an ergodic hidden Markov model (HMM), in which each HMM state corresponds to a signal source. The signal source sequence can be determined by using a decoding procedure (Viterbi algorithm or forward algorithm) over the observed sequence. Baum-Welch training is used to estimate HMM parameters from the training material. As an example of the multiple signal source classification problem, an experiment is performed on unknown speaker classification. The results show a classification rate of 79% for 4 male speakers. The results also indicate that the model is sensitive to the initial values of the ergodic HMM and that employing the long-distance LPC cepstrum is effective for signal preprocessing.
[1] M. Sugiyama,et al. Speech segmentation and clustering based on speaker features , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2] H. Gish,et al. An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] Steve Young,et al. Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .