Intersession variability in speaker recognition: a behind the scene analysis

The representation of a speaker’s identity by means of Gaussian supervectors (GSV) is at the heart of most of the state-of-the-art recognition systems. In this paper we present a novel procedure for the visualization of GSV by which qualitative insight about the information being captured can be obtained. Based on this visualization approach, the Switchboard-I database (SWB-I) is used to study the relationship between a data-driven partition of the acoustic space and a knowledge based partition (i.e., broad phonetic classes). Moreover, the structure of an intersession variability subspace (IVS), computed from the SWB-I database, is analyzed by displaying the projection of a speaker’s GSV into the set of eigenvectors with highest eigenvalues. This analysis reveals a strong presence of linguistic information in the IVS components with highest energy. Finally, after projecting away the information contained in the IVS from the speaker’s GSV, a visualization of the resulting GSV provides information about the characteristic patterns of spectral allocation of energy of a speaker.

[1]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Thomas Fang Zheng,et al.  Session Variability Subspace Projection Based Model Compensation for Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  A. Jongman Acoustics of American English Speech: A Dynamic Approach , 1995 .

[6]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.