Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm

Speech has been an integral part of human life acting as one of the five primitive senses of the human body. As such any software or application based upon speech recognition has a high degree of acceptance and a wide range of applications in defense, security, health care, and home automation. Speech is a waffling signal with varying characteristics at a high rate. When examined over a very short scale of time, it can be considered as a stationary signal with very small variations. In this paper, authors have worked upon the detection of a single user using multiple isolated words as speech signals. For designing the system, feature extraction using Mel-frequency cepstral coefficients (MFCCs) and feature matching using dynamic time warping (DTW) are considered as the designing of the system because of its simplicity and efficiency. Short-time spectral analysis is adopted which is the main part of the MFCC algorithm used in feature extraction. To compare any two signals varying in speed or having phase difference between them, DTW is used. Since two spoken words can never be the same, the DTW algorithm is best suited to compare two words.

[1]  N. Ramesh Babu,et al.  Speech recognition using MFCC and DTW , 2014, 2014 International Conference on Advances in Electrical Engineering (ICAEE).

[2]  Geeta Nijhawan,et al.  ISOLATED SPEECH RECOGNITIONUSING MFCC AND DTW , 2013 .

[3]  Ana-Maria Cretu,et al.  Static and Dynamic Hand Gesture Recognition in Depth Data Using Dynamic Time Warping , 2016, IEEE Transactions on Instrumentation and Measurement.

[4]  Herman Mawengkang,et al.  Performance Measurement Of Mel Frequency Ceptral Coefficient (MFCC) Method In Learning System Of Al- Qur’an Based In Nagham Pattern Recognition , 2017 .

[5]  Gafar Zen Alabdeen Salh,et al.  Voice Recognition using Dynamic Time Warping and Mel-Frequency Cepstral Coefficients Algorithms , 2015 .

[6]  W S M Sanjaya,et al.  The Implementation of Speech Recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machine (SVM) method based on Python to Control Robot Arm , 2018 .

[7]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Chin-Hui Lee,et al.  Bayesian adaptation in speech recognition , 1983, ICASSP.

[9]  H. Azami,et al.  An Improved Signal Segmentation Using Moving Average and Savitzky-Golay Filter , 2012 .

[10]  K.F. Lee,et al.  On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition , 1993, IEEE Trans. Speech Audio Process..

[11]  Ranjan Parekh,et al.  Recognition of Isolated Words using Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers , 2012 .

[12]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.