Nearest neighbor discriminant analysis for language recognition

Many state-of-the-art i-vector based voice biometric systems use linear discriminant analysis (LDA) as a post-processing stage to increase the computational efficiency in the back-end via dimensionality reduction, as well as annihilate the undesired (noisy) directions in the total variability subspace. The traditional approach for computing the LDA transform uses parametric representations for both intra- and inter-class scatter matrices that are based on the Gaussian distribution assumption. However, it is known that the actual distribution of i-vectors may not necessarily be Gaussian, and in particular, in the presence of noise and channel distortions. In addition, the rank of the LDA projection (i.e., the maximum number of available discriminant bases) is limited to the number of classes minus 1. Accordingly, language recognition tasks on noisy data that involve only a few language classes receive limited or no benefit from the LDA post-processing. Motivated by this observation, we present an alternative non-parametric discriminant analysis (NDA) technique that measures both the within- and between-language variation on a local basis using the nearest neighbor rule. The effectiveness of the NDA method is evaluated in the context of noisy language recognition tasks using speech material from the DARPA Robust Automatic Transcription of Speech (RATS) program. Experimental results indicate that NDA is more effective than the traditional parametric LDA for language recognition under noisy and channel degraded conditions.

[1]  Yun Lei,et al.  Adaptive Gaussian backend for robust language identification , 2013, INTERSPEECH.

[2]  Yun Lei,et al.  Application of Convolutional Neural Networks to Language Identification in Noisy Conditions , 2014, Odyssey.

[3]  M. Bressan,et al.  Nonparametric discriminant analysis and nearest neighbor classification , 2003, Pattern Recognit. Lett..

[4]  Dahua Lin,et al.  Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yun Lei,et al.  Improving language identification robustness to highly channel-degraded speech through multiple system fusion , 2013, INTERSPEECH.

[6]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[8]  Yun Lei,et al.  Spoken language recognition based on senone posteriors , 2014, INTERSPEECH.

[9]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[10]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[11]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[12]  Yun Lei,et al.  Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Kevin Walker,et al.  The RATS radio traffic collection system , 2012, Odyssey.

[16]  Seyed Omid Sadjadi,et al.  Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[18]  Tomi Kinnunen,et al.  Foreign accent detection from spoken Finnish using i-vectors , 2013, INTERSPEECH.

[19]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[20]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[21]  Seyed Omid Sadjadi,et al.  Nearest neighbor discriminant analysis for robust speaker recognition , 2014, INTERSPEECH.

[22]  Brian Kingsbury,et al.  Improvements to the IBM speech activity detection system for the DARPA RATS program , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  N. Lack Non-Parametric Discriminant Analysis , 1988 .

[25]  Mohamed Kamal Omar,et al.  Robust language identification using convolutional neural network features , 2014, INTERSPEECH.

[26]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[28]  Mireia Díez,et al.  On the use of phone log-likelihood ratios as features in spoken language recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[29]  G AndreouAndreas,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998 .