Developing Speaker Recognition System: From Prototype to Practical Application

In this paper, we summarize the main achievements made in the 4-year PUMS project during 2003-2007. The emphasis is on the practical implementations, how we have moved from Matlab and Praat scripting to C/C++ implemented applications in Windows, UNIX, Linux and Symbian environments, with the motivation to enhance technology transfer. We summarize how the baseline methods have been implemented in practice, how the results are utilized in forensic applications, and compare recognition results to the state-ofart and existing commercial products such as ASIS, FreeSpeech and VoiceNet.

[1]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  P. Fränti,et al.  Voice Activity Detection Using MFCC Features and Support Vector Machine , 2007 .

[5]  P. Fränti,et al.  645 Improving Speaker Verification by Periodicity Based Voice Activity Detection , .

[6]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Tomi Kinnunen,et al.  On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition , 2006 .

[8]  Tomi Kinnunen,et al.  APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA , 2005 .

[9]  Rong Tong,et al.  Fusion of Acoustic and Tokenization Features for Speaker Recognition , 2006, ISCSLP.

[10]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[11]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[12]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[13]  Tomi Kinnunen,et al.  Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification , 2008, IEEE Signal Processing Letters.

[14]  Rong Tong,et al.  Speaker cluster based GMM tokenization for speaker recognition , 2006, INTERSPEECH.

[15]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[16]  Qiang Huo Chinese Spoken Language Processing, 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings , 2006, ISCSLP.

[17]  Tomi Kinnunen,et al.  Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification , 2009, Pattern Recognit. Lett..

[18]  T. Kinnunen,et al.  Long-Term F0 Modeling for Text-Independent Speaker Recognition , 2005 .

[19]  Pasi Fränti,et al.  Accuracy of MFCC-Based Speaker Recognition in Series 60 Device , 2005, EURASIP J. Adv. Signal Process..

[20]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[21]  Pasi Fränti,et al.  Automatic voice activity detection in different speech applications , 2008, e-Forensics '08.

[22]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[23]  Tomi Kinnunen,et al.  On Factors Affecting MFCC-Based Speaker Recognition Accuracy , 2005 .

[24]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[25]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .