A pertinent learning machine input feature for speaker discrimination by voice

This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.Experiments of speaker discrimination are done on two different databases: Hub4 Broadcast-News database and a telephonic speech database, by using two learning machines: a Multi-Layer Perceptron (MLP) and a Support Vector Machines (SVM) with several input characteristics. Another comparative investigation is conducted by using two classical discriminative measures (Covariance-based mono-Gaussian distance and Kullback-Leibler distance) on the same databases.The originality of this relativist approach is that the new characteristic gives to the speaker a flexible model, since it changes every time that the competing speaker model changes. Results show that the new input characteristic is interesting in speaker discrimination. Furthermore, by using the Relative Speaker Characteristic, we reduce the size of the classifier input and the training time.

[1]  H. S. Lee,et al.  Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment , 1995, Speech Commun..

[2]  Younès Bennani,et al.  Neural networks for discrimination and modelization of speakers , 1995, Speech Commun..

[3]  Irina Illina,et al.  A wavelet-based parameterization for speech/music discrimination , 2010, Comput. Speech Lang..

[4]  H. Gish Robust discrimination in automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Philip Rose,et al.  FORENSIC SPEAKER DISCRIMINATION WITH AUSTRALIAN ENGLISH VOWEL ACOUSTICS , 2007 .

[6]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[7]  李幼升,et al.  Ph , 1989 .

[8]  M. Guerti,et al.  Speaker segmentation using parallel fusion between three classifiers , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).

[9]  Younès Bennani Approches connexionnistes pour la reconnaissance automatique du locuteur : modelisation & identification , 1992 .

[10]  P. Delacourt La segmentation et le regroupement par locuteurs pour l'indexation de documents audio , 2000 .

[11]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[13]  J. Picone,et al.  Speaker Verification using Support Vector Machines , 2006, Proceedings of the IEEE SoutheastCon 2006.

[15]  Mhania Guerti,et al.  A new relativistic vision in speaker discrimination , 2008 .