Speech recognition using Principal Components Analysis and Neural Networks

In this paper, we intend to introduce a new approach to recognize discrete speeches, specifically pre-assumed words. Our approach is mainly based on Principal Components Analysis (PCA) and Neural Networks (NN). To do so, initially we build a data base which is provided by 20 speakers who uttered each predefined word 5 times and overall 10 Persian words. Then we apply Voice Activity Detection (VAD) and eliminate the useless portions of each frame and then by computing Mel Frequency Cepstral Coefficients (MFCCs), which are our useful features in the recognition process, and then applying PCA to reduce the size of our data set, we will successfully provide the inputs of the NN block. Using PCA will enable us to provide inputs with lower size to our recognition system which is an important feature of our approach by speeding up the training procedure while keeping the accuracy as high as possible. In another words, PCA will decrease the amount of computations we have to deal with usually in most recognition systems. We use 90% of our data set to train our algorithm and the remained 10% to test our algorithm and measure the accuracy of recognition process.

[1]  John H. L. Hansen,et al.  A Review on Speech Recognition Technique , 2010 .

[2]  Goutam Saha,et al.  A New Silence Removal and Endpoint Detection Algorithm for Speech and Speaker Recognition Applications , 2006 .

[3]  Pravin Yannawar,et al.  A Review on Speech Recognition Technique , 2010 .

[4]  Zheru Chi,et al.  An on-line adaptive neural network for speech recognition , 1998, Int. J. Speech Technol..

[5]  Homer Dudley,et al.  A Synthetic Speaker , 1939, Science.

[6]  Nikhil Sharma,et al.  SPEECH COMPRESSION USING LINEAR PREDICTIVE CODING(LPC) , 2012 .

[7]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[8]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[9]  Phil Clendeninn The Vocoder , 1940, Nature.

[10]  Jinjin Ye,et al.  Speech Recognition Using Time Domain Features from Phase Space Reconstructions , 2004 .

[11]  Noelia Alcaraz Meseguer Speech Analysis for Automatic Speech Recognition , 2009 .

[12]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[13]  A. Laub,et al.  The singular value decomposition: Its computation and some applications , 1980 .

[14]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[15]  Leonard Webster,et al.  Comparison of energy-based endpoint detectors for speech signal processing , 1996, Proceedings of SOUTHEASTCON '96.