Using Data-driven and Phonetic Units for Speaker Verification

Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approaches

[1]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[3]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4]  Luis A. Hernández Gómez,et al.  On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy , 2005, INTERSPEECH.

[5]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[6]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[7]  K. M. Ponting,et al.  Computational Models of Speech Pattern Processing , 1999, NATO ASI Series.

[8]  Asmaa El Hannani,et al.  Exploiting High-Level Information Provided by ALISP in Speaker Recognition , 2005, NOLISP.

[9]  Joseph P. Campbell,et al.  Phonetic, idiolectal and acoustic speaker recognition , 2001, Odyssey.

[10]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[11]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .

[12]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[13]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.