论文信息 - Local fisher discriminant analysis for spoken language identification

Local fisher discriminant analysis for spoken language identification

I-vector is a state-of-the-art technique widely used in spoken language identification systems. Since i-vectors include total variability factors, discriminant analysis methods have been introduced to find the most discriminative features while removing the undesired variables for language identification, for example, linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA). However, these methods either do not consider or use weak local structures of the data. In this study, we introduce a local Fisher discriminant analysis (LFDA) as a post-processing discriminant analysis method to extract the discriminative features from i-vectors. LFDA is a full-rank method which takes the local structure of the data into account for non-Gaussian distribution data, i.e., multimodal. Compared with LDA and NDA, LFDA is a pair-wise local method which enhances the centralization of the distribution of samples in the same class to obtain larger amounts of discriminative features. Experimental results indicate that LFDA is more effective than LDA and NDA for the i-vector-based language identification task.

Lemao Liu | Hisashi Kawai | Xugang Lu | Peng Shen

[1] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2] Gunnar Rätsch,et al. A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[3] Seyed Omid Sadjadi,et al. Nearest neighbor discriminant analysis for language recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Ruixiang Liu,et al. Face Recognition Based on LFDA and LS-SVM , 2009, 2009 Second International Conference on Education Technology and Training.

[5] Yun Lei,et al. Adaptive Gaussian backend for robust language identification , 2013, INTERSPEECH.

[6] Masashi Sugiyama,et al. Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[7] K. Fukunaga,et al. Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Douglas A. Reynolds,et al. Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[9] Lei Wang,et al. Face recognition using maximum local Fisher Discriminant Analysis , 2011, 2011 18th IEEE International Conference on Image Processing.

[10] Chin-Hui Lee,et al. Principles of Spoken Language Recognition , 2008 .

[11] Yan Song,et al. i-vector representation based on bottleneck features for language identification , 2013 .

[12] Bin Ma,et al. Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[13] Dahua Lin,et al. Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Sergey Novoselov,et al. STC Speaker Recognition System for the NIST i-Vector Challenge , 2014, Odyssey.

[15] Xiaofei He,et al. Locality Preserving Projections , 2003, NIPS.

[16] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.

[17] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[18] Aleix M. Martínez,et al. Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.