Using redundant parallel architecture to improve speaker recognition performance

In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, compared with a conventional device, especially when pruning of some recognizers is performed.

[1]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[2]  J. Stevens,et al.  Wavelet packet cepstral analysis for speaker recognition , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[3]  F Botti,et al.  The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. , 2004, Forensic science international.

[4]  Li Zhao,et al.  Study on speaker recognition under noise environments based on PCANN , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[5]  J. H. Chung,et al.  Pitch synchronous cepstrum for robust speaker recognition over telephone channels , 2004 .