CEPSTRUM-LIKE ICA REPRESENTATIONS FOR TEXT INDEPENDENT SPEAKER RECOGNITION

Automatic methods to determine voiceprints in speech samples predominantly use short-time spectra to yield specific features of a given speaker. Among these, the Mel Frequency Cepstrum Coefficient (MFCC) features are widely used today. The speaker recognition method presented here is based on short-time spectra, however the feature extraction process does not correspond to the MFCC process. The motivation was to avoid what we see as shortcomings of present approaches, particularly the blurring effect in the frequency domain, which confuses rather than helps in distinguishing speakers. We introduce a speech synthesis model that can be identified using Independent Component Analysis (ICA). The ICA representations of log spectral data result in cepstral-like, independent coefficients, which capture correlations among frequency bands specific to the given speaker. It also results in speaker specific basis functions. Coefficients determined from test data using a speaker’s true basis functions show a low degree of correlation, while those determined using other basis functions do not. This enables the system to reliably recognize speakers. The resulting speaker recognition method is text-independent, invariant over time, and robust to channel variability. Its effectiveness has been tested in representing and recognizing speakers from a set of 462 people from the TIMIT database.