Using Approximate Entropy as a speech quality measure for a speaker recognition system

In this paper, we will show that Approximate Entropy (ApEn) can be used to detect high-quality speech frames in an otherwise distorted speech signal. By exploiting the property of quasi-periodicity in speech, ApEn is able to detect small aberrations in speech frames that would otherwise cause a decrease in the performance in an automatic speaker recognition (ASR) system. In addition, we obtain the statistics of ApEn values representative of clean speech and propose threshold bounds to obtain maximum recognition rates. When compared to other popular voice activity detector (VAD) algorithms, our simulation results showed that utilization of ApEn will outperform the other VADs in discerning clean speech from noisy speech. This ability to properly detect clean speech allows for a speaker recognition system to obtain a recognition rate close to 87%, which is close to the same performance of the system when noise is not present.

[1]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[2]  Steven M. Pincus,et al.  A regularity statistic for medical data analysis , 1991, Journal of Clinical Monitoring.

[3]  John F. Doherty,et al.  Modulation Recognition in Continuous Phase Modulation Using Approximate Entropy , 2011, IEEE Transactions on Information Forensics and Security.

[4]  L. Fu,et al.  Approximate entropy and its application to fault detection and identification in power swing , 2009, 2009 IEEE Power & Energy Society General Meeting.

[5]  Donald G. Childers,et al.  Speech Processing , 1999 .

[6]  Ki H. Chon,et al.  Automatic Selection of the Threshold Value $r$ for Approximate Entropy , 2008, IEEE Transactions on Biomedical Engineering.

[7]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[9]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[11]  Christopher G. Wilson,et al.  The effect of time delay on Approximate & Sample Entropy calculations , 2008 .

[12]  Douglas E. Sturim,et al.  The MIT lincoln laboratory 2008 speaker recognition system , 2009, INTERSPEECH.

[13]  Nonlinear state space embedding features and their application to robust speech segmentation , 2004, Proceedings of 2004 International Symposium on Intelligent Signal Processing and Communication Systems, 2004. ISPACS 2004..

[14]  Tomi Kinnunen,et al.  On Factors Affecting MFCC-Based Speaker Recognition Accuracy , 2005 .

[15]  A. Sayadian,et al.  Voice Activity Detection Using Entropy in Spectrum Domain , 2008, 2008 Australasian Telecommunication Networks and Applications Conference.

[16]  W.B. Kleijn,et al.  Transformation and decomposition of the speech signal for coding , 1994, IEEE Signal Processing Letters.

[17]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.