A Review On Performance Of Voice Feature Extraction Techniques

In the digital era, the computing applications are to be secured from anonymous attacks by strengthening the authentication credentials. Numerous methodologies and algorithms have been proposed implementing human biometric as unique identity and one such identity is human voice print. The human voice print is a unique characteristic of the individual and has a wide variety of techniques in representing and extracting the features from the digital speech signals. The voice recognition techniques were executed on different platforms and exploit different mathematical tools in voice feature extraction, leading to dissimilarity in performance and results. In this paper, we investigate, analyze and present a review on performance of numerous voice recognition techniques.

[1]  Ganesh R. Naik,et al.  Enhanced Forensic Speaker Verification Using a Combination of DWT and MFCC Feature Warping in the Presence of Noise and Reverberation Conditions , 2017, IEEE Access.

[2]  Qi Li,et al.  An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Nirmala Salam,et al.  A reliable speaker verification system based on LPCC and DTW , 2014, 2014 IEEE International Conference on Computational Intelligence and Computing Research.

[4]  Zhen-Tao Liu,et al.  Speaker-independent speech emotion recognition based on random forest feature selection algorithm , 2017, 2017 36th Chinese Control Conference (CCC).

[5]  William M. Campbell,et al.  Polynomial classifier techniques for speaker verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Drisya Vasudev,et al.  Speaker identification using FBCC in Malayalam language , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[7]  Anil Kumar Vuppala,et al.  Analysis of source and system features for speaker recognition in emotional conditions , 2016, 2016 IEEE Region 10 Conference (TENCON).

[8]  Diksha Sharma,et al.  The effect of DC coefficient on mMFCC and mIMFCC for robust speaker recognition , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  Min Wu,et al.  Speech emotion recognition based on an improved brain emotion learning model , 2018, Neurocomputing.

[10]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[11]  Goutam Saha,et al.  Spectral Features for Synthetic Speech Detection , 2017, IEEE Journal of Selected Topics in Signal Processing.

[12]  Ghulam Muhammad,et al.  Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model. , 2016, Journal of voice : official journal of the Voice Foundation.

[13]  Norashikin Yahya,et al.  Relative spectral-perceptual linear prediction (RASTA-PLP) speech signals analysis using singular value decomposition (SVD) , 2017, 2017 IEEE 3rd International Symposium in Robotics and Manufacturing Automation (ROMA).

[14]  Ning Wang,et al.  Robust speaker recognition based on multi-stream features , 2016, 2016 IEEE International Conference on Consumer Electronics-China (ICCE-China).

[15]  Richard M. Stern,et al.  Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition , 2017, IEEE Signal Processing Letters.

[16]  H. Palo,et al.  Wavelet based feature combination for recognition of emotions , 2017, Ain Shams Engineering Journal.

[17]  Chao Li,et al.  Pattern recognition approach to identify loose particle material based on modified MFCC and HMMs , 2015, Neurocomputing.

[18]  Hussein Hussein,et al.  Improvement of speech recognition results by a combination of systems , 2017, 2017 23rd International Conference on Automation and Computing (ICAC).

[20]  Shashidhar G. Koolagudi,et al.  Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC) , 2012 .

[21]  Hemant A. Patil,et al.  A Novel Approach to Identification of Speakers from Their Hum , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[22]  Yonghua Song,et al.  Model Predictive Control of LPC-Looped Active Distribution Network With High Penetration of Distributed Generation , 2017, IEEE Transactions on Sustainable Energy.

[23]  Rachid Hamdi,et al.  LPC-based formant enhancement method in Kalman filtering for speech enhancement , 2015 .

[24]  P. Dhanalakshmi,et al.  Pattern classification models for classifying and indexing audio signals , 2011, Eng. Appl. Artif. Intell..

[25]  Yasser Shekofteh,et al.  Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals , 2017, Comput. Electr. Eng..

[26]  Néstor Becerra Yoma,et al.  DNN-HMM based Automatic Speech Recognition for HRI Scenarios , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Manal Abdel Wahed Computer aided recognition of pathological voice , 2014, 2014 31st National Radio Science Conference (NRSC).

[28]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[29]  Elvira Sukma Wahyuni,et al.  Arabic speech recognition using MFCC feature extraction and ANN classification , 2017, 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE).

[30]  D. Hardt,et al.  Spectral subtraction and RASTA-filtering in text-dependent HMM-based speaker verification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Jun Guo,et al.  DNN Filter Bank Cepstral Coefficients for Spoofing Detection , 2017, IEEE Access.

[32]  Lou Boves,et al.  On the efficiency of classical RASTA filtering for continuous speech recognition: Keeping the balance between acoustic pre-processing and acoustic modelling , 2003, Speech Commun..

[33]  Marzieh Razavi,et al.  On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Anand R Mehta Optimization Based Speech Authentication System to Web Content for Disabled Users , 2018 .

[35]  Harshita Gupta,et al.  LPC and LPCC method of feature extraction in Speech Recognition System , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[36]  Mustapha Hamad,et al.  Android voice recognition application with multi speaker feature , 2016, 2016 18th Mediterranean Electrotechnical Conference (MELECON).

[37]  Jia-Lin Shen DISCRIMINATIVE TEMPORAL FEATURE EXTRACTION FOR ROBUST SPEECH RECOGNITION , 1997 .

[38]  Ing Yann Soon,et al.  An auditory model for robust speech recognition , 2008, 2008 International Conference on Audio, Language and Image Processing.

[39]  Jiren Xu,et al.  Speech Signal Feature Extraction Based on Wavelet Transform , 2011, 2011 International Conference on Intelligent Computation and Bio-Medical Instrumentation.

[40]  Claude Turner,et al.  A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification , 2015, Complex Adaptive Systems.

[41]  Bin Ma,et al.  Text-dependent speaker verification: Classifiers, databases and RSR2015 , 2014, Speech Commun..

[42]  Shrikanth S. Narayanan,et al.  Early auditory processing inspired features for robust automatic speech recognition , 2007, 2007 15th European Signal Processing Conference.

[43]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[44]  Hossein Marvi,et al.  Optimal MFCC features extraction by differential evolution algorithm for speaker recognition , 2017, 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS).

[45]  C. L. Philip Chen,et al.  Robust Mel-Frequency Cepstral coefficients feature detection and dual-tree complex wavelet transform for digital audio watermarking , 2015, Inf. Sci..

[46]  Eduardo Pavez,et al.  Analysis and design of Wavelet-Packet Cepstral coefficients for automatic speech recognition , 2012, Speech Commun..

[47]  Malaya Kumar Hota,et al.  A Study of Speech, Speaker and Emotion Recognition Using Mel Frequency Cepstrum Coefficients and Support Vector Machines , 2018, 2018 International Conference on Communication and Signal Processing (ICCSP).

[49]  Risanuri Hidayat,et al.  Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition , 2018, 2018 International Conference on Information and Communications Technology (ICOIACT).

[50]  Maged M.M. Fahmy,et al.  Palmprint recognition based on Mel frequency Cepstral coefficients feature extraction , 2010 .