Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers

This paper proposes an approach to recognize English words corresponding to digits Zero to Nine spoken in an isolated way by different male and female speakers. A set of features consisting of a combination of Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), Zero Crossing Rate (ZCR), and Short Time Energy (STE) of the audio signal, is used to generate a 63-element feature vector, which is subsequently used for discrimination. Classification is done using artificial neural networks (ANN) with feedforward back-propagation architectures. An accuracy of 85% is obtained by the combination of features, when the proposed approach is tested using a dataset of 280 speech samples, which is more than those obtained by using the features singly.

[1]  Damjan Vlaj,et al.  ROBUST MFCC FEATURE EXTRACTION ALGORITHM USING EFFICIENT ADDITIVE AND CONVOLUTIONAL NOISE REDUCTION PROCEDURES , 2002 .

[2]  Fernando Díaz-de-María,et al.  Support Vector Machines for continuous speech recognition , 2006, 2006 14th European Signal Processing Conference.

[3]  Ranjan Parekh,et al.  AUTOMATED SPEECH RECOGNITION OF ISOLATED WORDS USING NEURAL NETWORKS , 2011 .

[4]  J. Gowdy,et al.  A speaker-independent speech-recognition system based on linear prediction , 1978 .

[5]  Roberto Togneri,et al.  Speech recognition using the probabilistic neural network , 1998, ICSLP.

[6]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[7]  Suryo Wijoyo,et al.  Speech Recognition Using Linear Predictive Coding and Artificial Neural Network for Controlling Movement of Mobile Robot , 2011 .

[8]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Koichi Shinoda,et al.  Robust Speech Recognition Using Factorial HMMs for Home Environments , 2007, EURASIP J. Adv. Signal Process..

[10]  M. Saadeq Rafieee,et al.  A novel model characteristics for noise-robust Automatic Speech Recognition based on HMM , 2010, 2010 IEEE International Conference on Wireless Communications, Networking and Information Security.

[11]  Lawrence R. Rabiner,et al.  Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary , 1979 .

[12]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[13]  Othman O. Khalifa,et al.  English digits speech recognition system based on Hidden Markov Models , 2010, International Conference on Computer and Communication Engineering (ICCCE'10).

[14]  Leon Cohen,et al.  Fitting the Mel scale , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Edward J. Delp,et al.  Speech recognition using LPC analysis , 1982 .