Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network

Speech signal is temporally and acoustically varies. Recognition of speech by static input Neural Network requires temporal normalization of the speech to be equal to the number of input nodes of the NN while maintaining the properties of the speech. This paper compares three methods for speech temporal normalization namely the linear, extended linear and zero padded normalizations on isolated speech using different sets of learning parameters on multi layer perceptron neural network with adaptive learning. Although, previous work shows that linear normalization able to give high accuracy up to 95% on similar problem, the result in this experiment shows the opposite. The experimental result shows that zero padded normalization outperformed the two linear normalization methods using all the parameter sets tested. The highest recognition rate using zero padded normalization is 99% while linear and extended linear normalizations give only 74% and 76% respectively. This paper end before conclusion by comparing data used from previous work using linear normalization which gave high accuracy and the data used in this experiment which perform poorer.

[1]  Michael Negnevitsky,et al.  Artificial Intelligence: A Guide to Intelligent Systems , 2001 .

[2]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[3]  F. Mouria-Beji,et al.  Neural network optimization using genetic algorithms for speech recognition , 2002 .

[4]  Shaikh Salleh,et al.  A comparative study of the traditional classifier and the connectionist model for speaker dependent speech recognition system , 1993 .

[5]  Valeri Mladenov,et al.  Neural networks used for speech recognition , 2010 .

[6]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[7]  J. Tebelskis,et al.  Speech Recognition Using Neural Networks , 1996 .

[8]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[9]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[10]  S. Khan,et al.  Speech coding for Bluetooth with CVSD algorithm , 2004, 2004 RF and Microwave Conference (IEEE Cat. No.04EX924).

[11]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[13]  W. B. Mikhael,et al.  Speaker verification/recognition and the importance of selective feature extraction: review , 2001, Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems. MWSCAS 2001 (Cat. No.01CH37257).

[14]  Steve Young A review of large-vocabulary continuous-speech , 1996 .

[15]  S. M. Peeling,et al.  Isolated digit recognition experiments using the multi-layer perceptron , 1988, Speech Commun..

[16]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[17]  Jenq-Neng Hwang,et al.  A limited feedback time-delay neural network , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[18]  S.H. El-Ramly,et al.  Neural networks used for speech recognition , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[19]  Jun Yang,et al.  Isolated speech recognition using artificial neural networks , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  David J. Burr,et al.  Experiments on neural net recognition of spoken and written text , 1988, IEEE Trans. Acoust. Speech Signal Process..

[21]  Mahmoud Abushariah,et al.  A VECTOR QUANTIZATION APPROACH TO ISOLATED-WORD AUTOMATIC SPEECH RECOGNITION , 2006 .

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.