Speaker discrimination based on fuzzy fusion and feature reduction techniques

In this paper, we propose a research work on speaker discrimination using a multi-classifier fusion with focus on feature reduction effects. Speaker discrimination consists in the automatic distinction between two speakers using the vocal characteristics of their speeches. A number of features are extracted using Mel Frequency Spectral Coefficients and then reduced using Relative Speaker Characteristic (RSC) along with the Principal Components Analysis (PCA). Several classification methods are implemented to ensure the discrimination task. Since different classifiers are employed, two fusion algorithms at the decision level, referred to as Weighted Fusion and Fuzzy Fusion, are proposed to boost the classification performances. These algorithms are based on the weighting of the different classifiers outputs. Furthermore, the effects of speaker gender and feature reduction on the speaker discrimination task have been examined too. The evaluation of our approaches was conducted on a subset of Hub-4 Broadcast-News. The experimental results have shown that the speaker discrimination accuracy is improved by 5–15% using the (RSC–PCA) feature reduction. In addition, the proposed fusion methods recorded an improvement of about 10% compared to the individual scores of the classifiers. Finally, we noticed that the gender has an important impact on the discrimination performances.

[1]  Jindrich Matousek,et al.  GMM-based speaker gender and age classification after voice conversion , 2016, 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Ali Ghodsi,et al.  Dimensionality Reduction A Short Tutorial , 2006 .

[4]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[5]  Tony Jebara,et al.  Machine Learning: Discriminative and Generative , 2012 .

[7]  Shrikanth S. Narayanan,et al.  Speaker verification based on the fusion of speech acoustics and inverted articulatory signals , 2016, Comput. Speech Lang..

[8]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[9]  Frédéric Bimbot,et al.  Automatic Speaker Recognition , 2010 .

[10]  Natalia A. Tomashenko,et al.  Speaker Verification Using Spectral and Durational Segmental Characteristics , 2015, SPECOM.

[11]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[12]  Fathi E. Abd El-Samie,et al.  Information Security for Automatic Speaker Identification , 2011 .

[13]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[14]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Halim Sayoud,et al.  Optimal Spectral Resolution in Speaker Authentication Application in Noisy Environment and Telephony , 2009, Int. J. Mob. Comput. Multim. Commun..

[16]  Douglas A. Reynolds,et al.  Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[17]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[18]  Yu Zong,et al.  Applied Data Mining , 2013 .

[19]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Ning Chen,et al.  Sparsity Analysis and Compensation for i-Vector Based Speaker Verification , 2015, SPECOM.

[22]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[23]  Di Wu,et al.  Multimodel biometrics Fusion based on FAR and FRR using Triangular Norm , 2015, Int. J. Comput. Intell. Syst..

[24]  Man-Wai Mak,et al.  A study of voice activity detection techniques for NIST speaker recognition evaluations , 2014, Comput. Speech Lang..

[25]  Mhania Guerti,et al.  A new relativistic vision in speaker discrimination , 2008 .