Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm

Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker’s features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process involves three main steps: acoustic processing, feature extraction, and classification/recognition. The purpose of feature extraction is to illustrate a speech signal using a predetermined number of signal components. This is because all information in the acoustic signal is excessively cumbersome to handle, and some information is irrelevant in the identification task. This study proposes a machine learning-based approach that performs feature parameter extraction from speech signals to improve the performance of speech recognition applications in real-time smart city environments. Moreover, the principle of mapping a block of main memory to the cache is used efficiently to reduce computing time. The block size of cache memory is a parameter that strongly affects the cache performance. In particular, the implementation of such processes in real-time systems requires a high computation speed. Processing speed plays an important role in speech recognition in real-time systems. It requires the use of modern technologies and fast algorithms that increase the acceleration in extracting the feature parameters from speech signals. Problems with overclocking during the digital processing of speech signals have yet to be completely resolved. The experimental results demonstrate that the proposed method successfully extracts the signal features and achieves seamless classification performance compared to other conventional speech recognition algorithms.

[1]  Fazal Haque Malik,et al.  The Impact of Agile Methodology on Project Success, with a Moderating Role of Person’s Job Fit in the IT Industry of Pakistan , 2022, Applied Sciences.

[2]  T. Whangbo,et al.  Improved Real-Time Fire Warning System Based on Advanced Technologies for Visually Impaired People , 2022, Sensors.

[3]  T. Whangbo,et al.  Attention 3D U-Net with Multiple Skip Connections for Segmentation of Brain Tumor Images , 2022, Sensors.

[4]  J. Hung,et al.  A Preliminary Study of Robust Speech Feature Extraction Based on Maximizing the Probability of States in Deep Acoustic Models , 2022, Applied System Innovation.

[5]  Ana-Luiza Rusnac,et al.  CNN Architectures and Feature Extraction Methods for EEG Imaginary Speech Recognition , 2022, Sensors.

[6]  Jinsoo Cho,et al.  Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language , 2022, Sensors.

[7]  M. Mukhiddinov,et al.  Automatic Fire Detection and Notification System Based on Improved YOLOv4 for the Blind and Visually Impaired , 2022, Sensors.

[8]  Heung-No Lee,et al.  Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning , 2022, Sensors.

[9]  N. Harte,et al.  Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech , 2022, Comput. Speech Lang..

[10]  Mukhriddin Mukhiddinov,et al.  An improvement for the automatic classification method for ultrasound images used on CNN , 2021, Int. J. Wavelets Multiresolution Inf. Process..

[11]  Chuanhai Chen,et al.  A vibration segmentation approach for the multi-action system of numerical control turret , 2021, Signal Image Video Process..

[12]  Ugur Ayvaz,et al.  Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning , 2022, Computers, Materials & Continua.

[13]  Salar Mohtaj,et al.  A Feature Extraction Based Model for Hate Speech Identification , 2022, FIRE.

[14]  A. Abdusalomov,et al.  LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model , 2021, Applied Sciences.

[15]  Taeg Keun Whangbo,et al.  3D Volume Reconstruction from MRI Slices based on VTK , 2021, 2021 International Conference on Information and Communication Technology Convergence (ICTC).

[16]  Mehmet Turkan,et al.  Audio-Visual Speech Recognition using 3D Convolutional Neural Networks , 2021, 2021 Innovations in Intelligent Systems and Applications Conference (ASYU).

[17]  Taeg Keun Whangbo,et al.  An Improvement of the Fire Detection and Classification Method Using YOLOv3 for Surveillance Systems , 2021, Sensors.

[18]  Luca Fanucci,et al.  Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria , 2021, Sensors.

[19]  Ahmad B. Rad,et al.  A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation , 2021, Sensors.

[20]  Peter Bell,et al.  Speech Acoustic Modelling from Raw Phase Spectrum , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Feng Ye,et al.  A Deep Neural Network Model for Speaker Identification , 2021, Applied Sciences.

[22]  Jong Won Shin,et al.  Dual-Mic Speech Enhancement Based on TF-GSC with Leakage Suppression and Signal Recovery , 2021, Applied Sciences.

[23]  Juraj Kacur,et al.  On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition , 2021, Sensors.

[24]  Victor Uc Cetina,et al.  Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics , 2021, ArXiv.

[25]  Daniel Sierra-Sosa,et al.  Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models , 2021, Sensors.

[26]  SangYeob Oh,et al.  DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition , 2021 .

[27]  Shamik Tiwari,et al.  Phonocardiogram Signal Based Multi-Class Cardiac Diagnostic Decision Support System , 2021, IEEE Access.

[28]  Yinping Wang,et al.  Detecting Pronunciation Errors in Spoken English Tests Based on Multifeature Fusion Algorithm , 2021, Complex..

[29]  K. M. Imtiaz-Ud-Din,et al.  A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification , 2020, 2020 23rd International Conference on Computer and Information Technology (ICCIT).

[30]  Taeg Keun Whangbo,et al.  Evolving Hierarchical and Tag Information via the Deeply Enhanced Weighted Non-Negative Matrix Factorization of Rating Predictions , 2020, Symmetry.

[31]  Young Im Cho,et al.  Automatic Fire and Smoke Detection Method for Surveillance Systems Based on Dilated CNNs , 2020, Atmosphere.

[32]  M. Rakhimov,et al.  Accelerated Training for Convolutional Neural Networks , 2020, 2020 International Conference on Information Science and Communications Technologies (ICISCT).

[33]  Seung Jun Lee,et al.  A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection , 2020, Applied Sciences.

[34]  M. Rakhimov,et al.  A High-Performance Parallel Approach to Image Processing in Distributed Computing , 2020, 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT).

[35]  Young Im Cho,et al.  Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion , 2020, Int. J. Wavelets Multiresolution Inf. Process..

[36]  Ting Liu,et al.  An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means , 2020 .

[37]  Young Im Cho,et al.  Automatic Moving Shadow Detection and Removal Method for Smart City Environments , 2020 .

[38]  Taeg Keun Whangbo,et al.  Automatic Salient Object Extraction Based on Locally Adaptive Thresholding to Generate Tactile Graphics , 2020, Applied Sciences.

[39]  Latika Singh,et al.  Comparisons of Speech Parameterisation Techniques for Classification of Intellectual Disability Using Machine Learning , 2020, Int. J. Cogn. Informatics Nat. Intell..

[40]  Özlem Batur Dinler,et al.  An Optimal Feature Parameter Set Based on Gated Recurrent Unit Recurrent Neural Networks for Speech Segment Detection , 2020, Applied Sciences.

[41]  Hongbing Zhang,et al.  Dynamic Feature Extraction Method of Phone Speaker Based on Deep Learning , 2020 .

[42]  Vishal Passricha,et al.  A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition , 2019, J. Intell. Syst..

[43]  Yuejin Zhao,et al.  Feature extraction and classification of heart sound using 1D convolutional neural networks , 2019, EURASIP Journal on Advances in Signal Processing.

[44]  Taeg Keun Whangbo,et al.  Detection and Removal of Moving Object Shadows Using Geometry and Color Information for Indoor Video Streams , 2019 .

[45]  Ilyos Khujayorov,et al.  Parallel Signal Processing Based-On Graphics Processing Units , 2019, 2019 International Conference on Information Science and Communications Technologies (ICISCT).

[46]  Muhammadjon Musaev,et al.  A Method of Mapping a Block of Main Memory to Cache in Parallel Processing of the Speech Signal , 2019, 2019 International Conference on Information Science and Communications Technologies (ICISCT).

[47]  Muminov Bakhodir Boltaevich,et al.  Estimation affects of formats and resizing process to the accuracy of convolutional neural network , 2019, 2019 International Conference on Information Science and Communications Technologies (ICISCT).

[48]  Evaluation of Phonetic System for Speech Recognition on Smartphone , 2019, VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE.

[49]  Taeg Keun Whangbo,et al.  Fully Automatic Stroke Symptom Detection Method Based on Facial Features and Moving Hand Differences , 2019, 2019 International Symposium on Multimedia and Communication Technology (ISMAC).

[50]  Satoshi Tamura,et al.  Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors , 2019, Sensors.

[51]  Hasan Demirel,et al.  3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms , 2019, Entropy.

[52]  Ma Xin,et al.  A Study of Speech Feature Extraction Based on Manifold Learning , 2019, Journal of Physics: Conference Series.

[53]  Dorel Aiordachioaie,et al.  Fault detection of rolling element bearings using optimal segmentation of vibrating signals , 2019, Mechanical Systems and Signal Processing.

[54]  Nilesh M. Patil,et al.  Content-Based Audio Classification and Retrieval Using Segmentation, Feature Extraction and Neural Network Approach , 2019, Advances in Intelligent Systems and Computing.

[55]  W. S. Mada Sanjaya,et al.  Speech Recognition using Linear Predictive Coding (LPC) and Adaptive Neuro-Fuzzy (ANFIS) to Control 5 DoF Arm Robot , 2018, Journal of Physics: Conference Series.

[56]  Y. S. Rao,et al.  Template Based Real-Time Speech Recognition Using Digital Filters on DSP-TMS320F28335 , 2018, 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA).

[57]  S. Lokesh,et al.  Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method , 2017, Cluster Computing.

[58]  Rakhimov Mekhriddin Fazliddinovich,et al.  Parallel processing capabilities in the process of speech recognition , 2017, 2017 International Conference on Information Science and Communications Technologies (ICISCT).

[59]  Taeg Keun Whangbo,et al.  An improvement for the foreground recognition method using shadow removal technique for indoor environments , 2017, Int. J. Wavelets Multiresolution Inf. Process..

[60]  M. Saquib,et al.  Speaker Identification & Verification Using MFCC & SVM , 2017 .

[61]  Taeg Keun Whangbo,et al.  A Review on various widely used shadow detection methods to identify a shadow from image , 2016 .

[62]  M. Scott Speech imagery recalibrates speech-perception boundaries , 2016, Attention, perception & psychophysics.

[63]  A. Abdusalomov,et al.  Robust Shadow Removal Technique For Improving Image Enhancement Based On Segmentation Method , 2016 .

[64]  Ayten Atasoy,et al.  Emotion recognition from speech signal using mel-frequency cepstral coefficients , 2015, 2015 9th International Conference on Electrical and Electronics Engineering (ELECO).

[65]  Domingo Mery,et al.  Computer Vision for X-Ray Testing , 2015, Springer International Publishing.

[66]  Wen Jun Liu,et al.  Speech Feature Parameter Extraction and Recognition Based on Interpolation , 2014 .

[67]  Ryszard A. Makowski,et al.  Automatic speech signal segmentation based on the innovation adaptive filter , 2014, Int. J. Appl. Math. Comput. Sci..

[68]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[69]  Juan Manuel Górriz,et al.  Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .

[70]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[71]  M. Portnoff Short-time Fourier analysis of sampled speech , 1981 .