FPGA Implementation for GMM-Based Speaker Identification

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

[1]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[3]  Amine Bermak,et al.  An Efficient Digital VLSI Implementation of Gaussian Mixture Models-Based Classifier , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[5]  Roland Auckenthaler Text-independent speaker verification with limited resources , 2001 .

[6]  Naoya Wada,et al.  Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Masahiko Yoshimoto,et al.  A low memory bandwidth Gaussian mixture model (GMM) processor for 20,000-word real-time speech recognition FPGA system , 2008, 2008 International Conference on Field-Programmable Technology.

[8]  Mariano López García,et al.  SVM Speaker Verification System Based on a Low-Cost FPGA , 2009, FPL 2009.

[9]  Steven F. Quigley,et al.  Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models , 2002, FPL.

[10]  Steven F. Quigley,et al.  Implementing a simple continuous speech recognition system on an FPGA , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[11]  Steven F. Quigley,et al.  Implementing log-add algorithm in hardware , 2003 .

[12]  Stephen J. Melnikoff,et al.  Speech recognition in programmable logic , 2003 .

[13]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[14]  Rob A. Rutenbar,et al.  A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer , 2009, FPGA '09.