Structural optimization of deep belief network theorem for classification in speech recognition

Speech is natural verbal form of communication in human beings. Each spoken word consists of phonetic combinations of vowels and consonants. Speech recognition is an application of the field of study in pattern recognition, applying pattern matching to phonetic patterns for identification of various linguistic objects including parts of speech. The various techniques to approach the model of this study follows in speech recognition are traditionally recorded the speech, extracts the feature from the signal, analyze the signal using Fast Fourier Transform (FFT) from the time series data set of speech and classifying the model using Deep Belief Network (DBN). DBN, itself has many advantages like feature extraction and classification that are used in several applications especially in image processing and signal processing. The aim of this study is to construct semi-automated feature representation that can improve the machine learning application model especially in speech recognition. The performance of DBN in accuracy of data classification, depends on the structure of DBN. This study uses a structure optimization of DBN which based on combined technique of evolutionary computation. The result of the experimental in structural optimization of DBN indicates the structure have an improvement of 100% on the simple traditional dataset.

[1]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[3]  Gajanan K. Kharate,et al.  An Overview of Technical Progress in Speech Recognition , 2013 .

[4]  Larry Gillick,et al.  Multilingual speech recognition at Dragon Systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Chalapathy Neti,et al.  Towards a universal speech recognizer for multiple languages , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6]  Yaser Norouzi,et al.  Speech recognition using Principal Components Analysis and Neural Networks , 2016, 2016 IEEE 8th International Conference on Intelligent Systems (IS).

[7]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[9]  Zhang Jing,et al.  Speech recognition system based improved DTW algorithm , 2010, 2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[13]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  John H. L. Hansen,et al.  A Review on Speech Recognition Technique , 2010 .

[15]  Utpal Bhattacharjee,et al.  A Comparative Study Of LPCC And MFCC Features For The Recognition Of Assamese Phonemes , 2013 .

[16]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[17]  Yong Peng,et al.  EEG-based emotion classification using deep belief networks , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Navdeep Kaur,et al.  Speech Recognition Using Stochastic Approach:A Review , 2013 .

[19]  Richard M. Schwartz,et al.  Recent progress on the discriminative region-dependent transform for speech feature extraction , 2006, INTERSPEECH.

[20]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  Sethuraman Panchanathan,et al.  Multimodal emotion recognition using deep learning architectures , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Amro El-Jaroudi,et al.  Multilingual speech recognition: the 1996 byblos callhome system , 1997, EUROSPEECH.