论文信息 - FPGA Implementation for GMM-Based Speaker Identification

FPGA Implementation for GMM-Based Speaker Identification

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

Steven F. Quigley | Tim Allen | Phaklen EhKan

[1] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2] Douglas D. O'Shaughnessy,et al. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[3] Amine Bermak,et al. An Efficient Digital VLSI Implementation of Gaussian Mixture Models-Based Classifier , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4] Wendy J. Holmes,et al. Speech Synthesis and Recognition , 1988 .

[5] Roland Auckenthaler. Text-independent speaker verification with limited resources , 2001 .

[6] Naoya Wada,et al. Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7] Masahiko Yoshimoto,et al. A low memory bandwidth Gaussian mixture model (GMM) processor for 20,000-word real-time speech recognition FPGA system , 2008, 2008 International Conference on Field-Programmable Technology.

[8] Mariano López García,et al. SVM Speaker Verification System Based on a Low-Cost FPGA , 2009, FPL 2009.

[9] Steven F. Quigley,et al. Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models , 2002, FPL.

[10] Steven F. Quigley,et al. Implementing a simple continuous speech recognition system on an FPGA , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[11] Steven F. Quigley,et al. Implementing log-add algorithm in hardware , 2003 .

[12] Stephen J. Melnikoff,et al. Speech recognition in programmable logic , 2003 .

[13] Saifur Rahman,et al. SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[14] Rob A. Rutenbar,et al. A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer , 2009, FPGA '09.