Discriminating Speakers by Their Voices - A Fusion Based Approach

The task of Speaker Discrimination (SD) consists in checking whether two speech segments belong to the same speaker or not. In this research field, it is often difficult to decide what could be the best classifier in terms of accuracy and robustness. For that purpose, we have implemented 9 classifiers: Support Vector Machines, Linear Discriminant Analysis, Multi-Layer Perceptron, Generalized Linear Model, Self Organizing Map, Adaboost, Second Order Statistical Measures, Linear Regression and Gaussian Mixture Models. Furthermore, a new fusion approach is proposed and experimented in speaker discrimination. Several experiments of speaker discrimination were conducted on Hub4 Broadcast-News with relatively short segments. The obtained results have shown that the best classifier is the SVM and that the proposed fusion approach is quite interesting since it provided the best performances at all.

[1]  Stella Markantonatou,et al.  Applying the SOM Model to Text Classification According to Register and Stylistic Content , 2003, Int. J. Neural Syst..

[2]  Jing Zhao,et al.  Speaker recognition based on principal component analysis of LPCC and MFCC , 2014, 2014 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[3]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[4]  Ruihu Wang,et al.  AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review , 2012 .

[5]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Jing Zhang,et al.  An effective identification method for speaker recognition based on PCA and double VQ , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[7]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[8]  Xiongwei Zhang,et al.  Research on speaker feature dimension reduction based on CCA and PCA , 2010, 2010 International Conference on Wireless Communications & Signal Processing (WCSP).

[9]  Saeed Aghabozorgi,et al.  A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application , 2014 .

[10]  Anupam Shukla,et al.  Expert System for Speaker Identification Using Lip Features with PCA , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[11]  Belur V. Dasarathy,et al.  Decision fusion , 1994 .

[12]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[13]  Driss Matrouf,et al.  Accurate Log-Likelihood Ratio Estimation by using Test Statistical Model for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[14]  Ming Li,et al.  Hierarchical Speaker Verification Based on PCA and Kernel Fisher Discriminant , 2008, 2008 Fourth International Conference on Natural Computation.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  Anto P Babu,et al.  Text dependent speaker recognition using discrete stationary wavelet transform and PCA , 2009, 2009 International Conference on the Current Trends in Information Technology (CTIT).

[17]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[18]  Xiaoguang Wang,et al.  Variable selection for multivariate generalized linear models , 2014 .

[19]  Yin Jun-xun,et al.  A text-independent speaker recognition system based on Probabilistic Principle Component Analysis , 2012, 2012 3rd International Conference on System Science, Engineering Design and Manufacturing Informatization.

[20]  Philip Rose,et al.  FORENSIC SPEAKER DISCRIMINATION WITH AUSTRALIAN ENGLISH VOWEL ACOUSTICS , 2007 .

[21]  Mhania Guerti,et al.  A new relativistic vision in speaker discrimination , 2008 .

[22]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[23]  Jindrich Matousek,et al.  GMM Classification of Text-to-Speech Synthesis: Identification of Original Speaker's Voice , 2014, TSD.

[24]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..