Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method

Nowadays, speech recognition systems are used in various environments, namely, healthcare, robotics, vehicle control and unmanned aerial vehicle system. In recent years, many speech recognition systems have been developed to solve various issues in real world applications. We have proposed a novel speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Windowing and framing method is used to remove the Gaussian white noise present in the input speech signal. The de-noising block effectively uses the nonnegative matrix factorization algorithm for factorizing the Mel-magnitude spectra of noisy input audio signal. Moreover, the mel-frequency cepstral coefficients (MFCC) is used for finding the more important features exist in the speech signal. Finally, Laplace smoothing technique is used as the language model for recognizing the audio signals. MATLAB software is used for demonstrating the proposed Mel frequency cepstral coefficient with Windowing and Framing based speech recognition system. We have compared the proposed speech recognition system with wavelet based feature extraction and artificial neural network based feature extraction methods for speech recognition. The experimental results proved the good performance of the proposed Mel frequency cepstral coefficient with windowing and framing based speech recognition system.

[1]  Kavita Sharma Speech Denoising Using Different Types of Filters , 2012 .

[2]  Douglas D. O'Shaughnessy,et al.  Robust distributed speech recognition using two-stage Filtered Minima Controlled Recursive Averaging , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Nidhi Desai,et al.  Feature Extraction and Classification Techniques for Speech Recognition: A Review , 2013 .

[4]  Balakrishnan Ganesan,et al.  Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model , 2014, TheScientificWorldJournal.

[5]  Gunasekaran Manogaran,et al.  Wearable sensor devices for early detection of Alzheimer disease using dynamic time warping algorithm , 2018, Cluster Computing.

[6]  Tara N. Sainath,et al.  Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Gunasekaran Manogaran,et al.  Health data analytics using scalable logistic regression with stochastic gradient descent , 2018, Int. J. Adv. Intell. Paradigms.

[8]  Hsin-Ju Hsieh,et al.  Linear prediction filtering on cepstral time series for noise-robust speech recognition , 2016, 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW).

[9]  Gunasekaran Manogaran,et al.  Big Data Security Framework for Distributed Cloud Data Centers , 2017 .

[10]  Richard M. Stern,et al.  Minimum variance modulation filter for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  D. Lopez,et al.  Climate change and disease dynamics - A big data perspective , 2016 .

[12]  Michael Picheny,et al.  Using semantic analysis to improve speech recognition performance , 2005, Comput. Speech Lang..

[13]  Kaja Abbas,et al.  Big Data Analytics in Healthcare Internet of Things , 2017 .

[14]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Gunasekaran Manogaran,et al.  Disease Surveillance System for Big Climate Data Processing and Dengue Transmission , 2017, Int. J. Ambient Comput. Intell..

[16]  Kasiprasad Mannepalli,et al.  MFCC-GMM based accent recognition system for Telugu speech signals , 2015, International Journal of Speech Technology.

[17]  Pao-Chi Chang,et al.  Spectral-temporal receptive fields and MFCC balanced feature extraction for noisy speech recognition , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[18]  A. Mengistu Automatic Text Independent Amharic Language Speaker Recognition in Noisy Environment Using Hybrid Approaches of LPCC , MFCC and GFCC , 2017 .

[19]  Gunasekaran Manogaran,et al.  Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis , 2017, Multimedia Tools and Applications.

[20]  Usha Devi Gandhi,et al.  Enhanced DTLS with CoAP-based authentication scheme for the internet of things in healthcare application , 2017, The Journal of Supercomputing.

[21]  Subhasmita Sahoo,et al.  MFCC feature with optimized frequency range: An essential step for emotion recognition , 2016, 2016 International Conference on Systems in Medicine and Biology (ICSMB).

[22]  Qi Li,et al.  Recognition of noisy speech using dynamic spectral subband centroids , 2004, IEEE Signal Processing Letters.

[23]  Gurpreet Kaur,et al.  Analysis of Feature Extraction Methods for Speaker Dependent Speech Recognition , 2017 .

[24]  Gunasekaran Manogaran,et al.  A Gaussian process based big data processing framework in cluster computing environment , 2017, Cluster Computing.

[25]  Xiao Zhi Gao,et al.  An adaptive decision based kriging interpolation algorithm for the removal of high density salt and pepper noise in images , 2017, Comput. Electr. Eng..

[26]  P. Dhanalakshmi,et al.  Analysis of Throat Microphone Using MFCC Features for Speaker Recognition , 2016 .

[27]  Ganesh Chandra Deka,et al.  Big Data Architecture for Climate Change and Disease Dynamics , 2016 .

[28]  Kun Li,et al.  Intonation classification for L2 English speech using multi-distribution deep neural networks , 2017, Comput. Speech Lang..

[29]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[31]  Yoshimitsu Kuroki,et al.  Speech recognition of different sampling rates using fractal code descriptor , 2016, 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[32]  Sanjeev Khudanpur,et al.  A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Daniel Jurafsky,et al.  Building DNN acoustic models for large vocabulary speech recognition , 2014, Comput. Speech Lang..

[34]  Gunasekaran Manogaran,et al.  MetaCloudDataStorage Architecture for Big Data Security in Cloud Computing , 2016 .

[35]  Gunasekaran Manogaran,et al.  A survey of big data architectures and machine learning algorithms in healthcare , 2017 .

[36]  Gunasekaran Manogaran,et al.  Spatial cumulative sum algorithm with big data analytics for climate change detection , 2017, Comput. Electr. Eng..

[37]  Usha Devi Gandhi,et al.  A novel three-tier Internet of Things architecture with machine learning algorithm for early detection of heart diseases , 2017, Comput. Electr. Eng..

[38]  Shinji Watanabe,et al.  Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Mohammed Bennamoun,et al.  A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition , 2017, Speech Commun..

[40]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  G. Usha Devi,et al.  Energy efficient node selection algorithm based on node performance index and random waypoint mobility model in internet of vehicles , 2017, Cluster Computing.

[42]  Jean Rouat,et al.  Robust Recognition of Simultaneous Speech by a Mobile Robot , 2007, IEEE Transactions on Robotics.

[43]  Gunasekaran Manogaran,et al.  Modelling the H1N1 influenza using mathematical and neural network approaches , 2017 .

[44]  P. Malathi,et al.  Speaker dependent speech emotion recognition using MFCC and Support Vector Machine , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[45]  V. Vijayalakshmi,et al.  PERFORMANCE COMPARISON OF SPEECH RECOGNITION FOR VOICE ENABLING APPLICATIONS - A STUDY , 2016 .

[46]  Daphne Lopez,et al.  Assessment of Vaccination Strategies Using Fuzzy Multi-criteria Decision Making , 2015 .

[47]  In-Cheol Park,et al.  Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[48]  Gunasekaran Manogaran,et al.  Big Data Security Intelligence for Healthcare Industry 4.0 , 2017 .

[49]  Geeta Nijhawan,et al.  ISOLATED SPEECH RECOGNITIONUSING MFCC AND DTW , 2013 .

[50]  Runzhi Li,et al.  Research on the application of biomimetic computing in speech recognition , 2008, 2008 International Conference on Audio, Language and Image Processing.

[51]  Shivam Sharma,et al.  Speech Recognition with Hidden Markov Model: A Review , 2015 .

[52]  Gunasekaran Manogaran,et al.  Human-Computer Interaction With Big Data Analytics , 2018 .

[53]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .

[54]  Harpreet Kaur,et al.  Spatial big data analytics of influenza epidemic in Vellore, India , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[55]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[56]  Kofi Appiah,et al.  A comparative review of dynamic neural networks and hidden Markov model methods for mobile on-device speech recognition , 2017, Neural Computing and Applications.

[57]  Gunasekaran Manogaran,et al.  Centralized Fog Computing Security Platform for IoT and Cloud in Healthcare System , 2018 .