A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.

[1]  Francesco Piazza,et al.  Nonlinear Speech Enhancement: An Overview , 2005, WNSP.

[2]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[3]  Björn W. Schuller,et al.  A Real-Time Speech Enhancement Framework in Noisy and Reverberated Acoustic Scenarios , 2012, Cognitive Computation.

[4]  J. S. Lim,et al.  Speech enhancement using the dual excitation speech model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Andreas Spanias,et al.  HMM-based speech enhancement using harmonic modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ben Messaoud,et al.  Estimation du pitch et dÈcision de voisement par compression spectrale de lautocorrÈlation du produit multi-Èchelle , 2012 .

[7]  Anna Esposito,et al.  The Perceptual and Cognitive Role of Visual and Auditory Channels in Conveying Emotional Information , 2009, Cognitive Computation.

[8]  S. Mallat A wavelet tour of signal processing , 1998 .

[9]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[10]  W. Bastiaan Kleijn,et al.  Generalized Postfilter for Speech Quality Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Brian M. Sadler,et al.  Analysis of Multiscale Products for Step Detection and Estimation , 1999, IEEE Trans. Inf. Theory.

[13]  Amel Grissa Touzi,et al.  New approach for conception and implementation of object oriented expert system using UML , 2009, Int. Arab J. Inf. Technol..

[14]  Bernard Widrow,et al.  Exploiting the harmonic structure for speech enhancement , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[16]  Noureddine Ellouze,et al.  An Efficient Method for Fundamental Frequency Determination of Noisy Speech , 2013, NOLISP.

[17]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[18]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  Xin Liu,et al.  Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[21]  Jirí Mekyska,et al.  Beyond Cognitive Signals , 2011, Cognitive Computation.

[22]  Israel Cohen,et al.  Speech enhancement using a noncausal a priori SNR estimator , 2004, IEEE Signal Processing Letters.

[23]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[24]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[25]  Amir Hussain,et al.  Novel Two-Stage Audiovisual Speech Filtering in Noisy Environments , 2013, Cognitive Computation.

[26]  Yang Lu,et al.  A geometric approach to spectral subtraction , 2008, Speech Commun..

[27]  Andrew Abel,et al.  Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System , 2015 .

[28]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[29]  Yi Hu,et al.  A subspace approach for enhancing speech corrupted by colored noise , 2002, IEEE Signal Processing Letters.

[30]  Dennis M. Healy,et al.  Wavelet transform domain filters: a spatially selective noise filtration technique , 1994, IEEE Trans. Image Process..

[31]  Tomohiro Nakatani,et al.  A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments , 2008, Speech Commun..

[32]  Ki Yong Lee,et al.  Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[33]  Francisco J. Valverde-Albacete,et al.  Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement , 2013, Cognitive Computation.

[34]  Olivier Cappé,et al.  Enhancement of speech based on non-parametric estimation of a time varying harmonic representation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[35]  Hwai-Tsu Hu,et al.  Supplementary schemes to spectral subtraction for speech enhancement , 2002, Speech Commun..

[36]  Noureddine Ellouze,et al.  Using multi-scale product spectrum for single and multi-pitch estimation , 2011 .

[37]  Tamar Frankel [The theory and the practice...]. , 2001, Tijdschrift voor diergeneeskunde.

[38]  Björn W. Schuller,et al.  Cognitive and Emotional Information Processing for Human–Machine Interaction , 2012, Cognitive Computation.

[39]  Alessandra Russo,et al.  Adaptive V/UV Speech Detection Based on Characterization of Background Noise , 2009, EURASIP J. Audio Speech Music. Process..

[40]  Guo-Hong Ding,et al.  Suppression of additive noise using a power spectral density MMSE estimator , 2004, IEEE Signal Processing Letters.

[41]  Amir Hussain,et al.  Cognitively Inspired Audiovisual Speech Filtering , 2015, SpringerBriefs in Cognitive Computation.

[42]  Shlomo Dubnov,et al.  Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[44]  Noureddine Ellouze,et al.  Electroglottographic Measures Based on GCI and GOI Detection Using Multiscale Product , 2008, Int. J. Comput. Commun. Control.

[45]  Saeed Vaseghi,et al.  Speech enhancement in temporal DFT trajectories using Kalman filters , 2005, INTERSPEECH.

[46]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[47]  Noureddine Ellouze,et al.  Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation , 2010, Cognitive Computation.

[48]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[49]  Feng Huang,et al.  Transform-domain Wiener filter for speech periodicity enhancement , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.