A Novel K-Means Voice Activity Detection Algorithm Using Linear Cross Correlation on the Standard Deviation of Linear Predictive Coding

This paper presents a novel Voice Activity Detection (VAD) technique that can be easily applied to on–device isolated word recognition on a mobile device. The main speech features used are the Linear Predictive Coding (LPC) speech features which were correlated using the standard deviation of the signal. The output was further clustered using a modified K-means algorithm. The results presented show a significant improvement to a previous algorithm which was based on the LPC residual signal with an 86.6 % recognition rate as compared to this new technique with a 90 % recognition rate on the same data. This technique was able to achieve up to 97.7 % recognition for female users in some of the experiments. The fast processing time makes it viable for mobile devices.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  Don-Lin Yang,et al.  An efficient Fuzzy C-Means clustering algorithm , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Bing-Fei Wu,et al.  Voice Activity Detection Based on Auto-Correlation Function Using Wavelet Transform and Teager Energy Operator , 2006, ROCLING/IJCLCLP.

[4]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[5]  Steven W. Smith CHAPTER 28 – Digital Signal Processors , 2002 .

[6]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Sridha Sridharan,et al.  Noise robust voice activity detection using features extracted from the time-domain autocorrelation function , 2010, INTERSPEECH.

[8]  Lindsay J. Evett,et al.  A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices , 2014, SGAI Conf..

[9]  Fuhuei Lin,et al.  A speech feature extraction method using complexity measure for voice activity detection in WGN , 2009, Speech Commun..

[10]  AbdulMalik S. Al-Salman,et al.  Arabic Text-Dependent Speaker Verification for Mobile Devices Using Artificial Neural Networks , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[11]  Lars Nolle,et al.  Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection , 2014, Expert Syst. J. Knowl. Eng..

[12]  Steven W. Smith,et al.  The Scientist and Engineer's Guide to Digital Signal Processing , 1997 .