FPGA Implementation of a Phase-Aware Single-Channel Speech Enhancement System

This paper presents a real-time architecture of an improved single-channel speech enhancement system based on phase-aware multi-band complex spectral subtraction. Using the proposed technique, the short-time spectral magnitude of the clean speech signal is estimated by considering the spectral phase of the speech and noise signal components. Moreover, the estimated spectral phase of the clean speech signal is also utilized for signal reconstruction in the time domain. The proposed system is made of the basic preprocessing module followed by an short-time Fourier transform analyzer, a noise power estimator based on improved minima controlled recursive array, a phase estimator unit and an overlap-add synthesis unit. The proposed architecture is implemented on a Field Programmable Gate Array (FPGA) using the Xilinx ISE tool. The overall resource utilization and the maximum operating frequency are also computed for a Virtex-6 FPGA chip. It has been experimentally shown that the proposed speech enhancement framework performs better than the other existing standard benchmark methods in terms of various quality and intelligibility assessment metrics.

[1]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[3]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[4]  M. K. Hasan,et al.  A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[5]  Hamid Reza Abutalebi,et al.  Generalization of Maximum A Posteriori Amplitude Estimator Under Speech Presence Uncertainty for Speech Enhancement , 2014, Circuits Syst. Signal Process..

[6]  A. Stojcevski,et al.  Implementation of magnitude estimation algorithm for hearing aid , 2004, IEEE International Workshop on Biomedical Circuits and Systems, 2004..

[7]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[8]  S. R. Mahadeva Prasanna,et al.  Enhancement of noisy speech by temporal and spectral processing , 2011, Speech Commun..

[9]  Mohammed Bahoura,et al.  Implementation of spectral subtraction method on FPGA using high-level programming tool , 2012, 2012 24th International Conference on Microelectronics (ICM).

[10]  Upal Mahbub,et al.  FPGA implementation of Real Time acoustic noise suppression by Spectral Subtraction using Dynamic Moving Average Method , 2009, 2009 IEEE Symposium on Industrial Electronics & Applications.

[11]  Indrajit Chakrabarti,et al.  Two-Stage Temporal Processing for Single-Channel Speech Enhancement , 2016, INTERSPEECH.

[12]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[13]  Jonathan Le Roux,et al.  Phase Processing for Single-Channel Speech Enhancement: History and recent advances , 2015, IEEE Signal Processing Magazine.

[14]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[15]  Pejman Mowlaee Begzade Mahale,et al.  Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition , 2015, IEEE Signal Processing Letters.

[16]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[17]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[19]  Ray Andraka,et al.  A survey of CORDIC algorithms for FPGA based computers , 1998, FPGA '98.

[20]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[21]  Sying-Jyan Wang,et al.  Low-power parallel multiplier with column bypassing , 2005 .

[22]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[23]  Keunsung Bae,et al.  Reduction of Musical Noise in Spectral Subtraction Method Using Subframe Phase Randomization , 1999 .

[24]  Gerhard Doblinger,et al.  Computationally efficient speech enhancement by spectral minima tracking in subbands , 1995, EUROSPEECH.

[25]  Trio Adiono,et al.  Real-time Architecture and FPGA Implementation of Adaptive General Spectral Substraction Method☆ , 2013 .

[26]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[27]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[28]  Kamil K. Wójcicki,et al.  Channel selection in the modulation domain for improved speech intelligibility in noise. , 2012, The Journal of the Acoustical Society of America.

[29]  M. Mason,et al.  FPGA implementation of spectral subtraction for in-car speech enhancement and recognition , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[30]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[31]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[33]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[34]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Ronald E. Crochiere,et al.  A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[36]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[37]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[38]  Indrajit Chakrabarti,et al.  Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal , 2016, IET Signal Process..

[39]  Mohammed Bahoura,et al.  FPGA-Implementation of Parallel and Sequential Architectures for Adaptive Noise Cancelation , 2011, Circuits Syst. Signal Process..

[40]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Rainer Martin,et al.  On Phase Importance in Parameter Estimation for Single-Channel Source Separation , 2012, IWAENC.

[42]  Indrajit Chakrabarti,et al.  Global soft decision based speech enhancement using voiced-unvoiced uncertainty and harmonic phase decomposition technique , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[43]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[44]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.