Low-Cost Speaker and Language Recognition Systems Running on a Raspberry Pi

This paper describes two state-of-the-art and portable voice-based authentication and language recognition systems. While the authentication system allows secure access to a media center at home, the language recognition system can be used as a previous step to automatically transcribe and translate the recognized text from its original language into another one. The most important advantage of the developed systems is that they can run on a low cost embedded device, such as a Raspberry Pi (RPi), and using only open-source projects, which makes it feasible to replicate or include in other systems, but also allows its implementation as part of educational projects in electronics. The developed systems have been tested on real data with very good results. Regarding the authentication system, the validation process is done in 3.3 seconds in average with an EER of 19% on test files with 20 seconds, and tested with up to 87 different speakers. On the other hand, the language recognition system is able to recognize up to six languages. For this system, important efforts were done in order to reduce the processing time and memory requirements while keeping high the recognition rate. The final system uses 64 Gaussians and 200 i-vectors, obtaining a Cavg error rate of 8.6% for the six languages.

[1]  Luis Fernando D'Haro,et al.  Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Steven F. Quigley,et al.  FPGA Implementation for GMM-Based Speaker Identification , 2011, Int. J. Reconfigurable Comput..

[3]  Luis Javier Rodriguez-Fuentes,et al.  The Albayzin 2012 Language Recognition Evaluation Plan ( Albayzin 2012 LRE ) , 2012 .

[4]  Xingming Zhang,et al.  An FPGA Implementation of Multi-Class Support Vector Machine Classifier Based on Posterior Probability , 2012 .

[5]  Shrikanth S. Narayanan,et al.  Speaker verification using simplified and supervised i-vector modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Enrique Cantó,et al.  Real-Time Speaker Verification System Implemented on Reconfigurable Hardware , 2013, J. Signal Process. Syst..

[7]  Jan Cernocký,et al.  Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts , 2012, INTERSPEECH.

[8]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[9]  M. A. Kohler,et al.  Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[10]  Enrique Cantó,et al.  SVM speaker verification system based on a low-cost FPGA , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11]  Ying Zhang,et al.  Jibbigo: Speech-to-speech translation on mobile devices , 2010, 2010 IEEE Spoken Language Technology Workshop.

[12]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[13]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Andreas Stolcke,et al.  Modeling prosodic feature sequences for speaker recognition , 2005, Speech Commun..

[15]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Christian Hacker,et al.  Revising Perceptual Linear Prediction (PLP) , 2005, INTERSPEECH.

[17]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[18]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[19]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[20]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[21]  Dan Yang,et al.  Embedded Speaker Recognition System Design and Implementation Based on FPGA , 2012 .

[22]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[23]  J. Rajnoha,et al.  ASR systems in Noisy Environment : Analysis and Solutions for Increasing Noise Robustness , 2011 .

[24]  A. A. Karpov,et al.  Information enquiry kiosk with multimodal user interface , 2009, Pattern Recognition and Image Analysis.

[25]  Federico Leonardo Alegre Application of ANN and HMM to Automatic Speaker Verification , 2007, IEEE Latin America Transactions.

[26]  Mireia Díez,et al.  The Albayzin 2010 Language Recognition Evaluation , 2011, INTERSPEECH.

[27]  Nicholas W. D. Evans,et al.  ALIZE/spkdet: a state-of-the-art open source software for speaker recognition , 2008, Odyssey.

[28]  Tomi Kinnunen,et al.  A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  R. Badlishah Ahmad,et al.  Speaker Recognition System: Vulnerable and Challenges , 2013, ICISA.

[30]  Enrique Cantó,et al.  Embedded System for Biometric Online Signature Verification , 2014, IEEE Transactions on Industrial Informatics.