Harmonic beamformers for speech enhancement and dereverberation in the time domain

Abstract This paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framework thus exploits both the spatial and temporal properties of speech signals simultaneously and includes both fixed and adaptive beamformers, such as (1) delay-and-sum, (2) null forming, (3) Wiener, (4) minimum variance distortionless response (MVDR), and (5) linearly constrained minimum variance beamformers. Moreover, the framework contains standard broadband beamforming as a special case, whereby the proposed beamformers can also handle unvoiced speech. The reported experimental results demonstrate the capabilities of the proposed framework to perform both speech enhancement and dereverberation simultaneously. The proposed beamformers are evaluated in terms of speech distortion and objective measures for speech quality and speech intelligibility, and are compared to nonparametric broadband beamformers. The results show that the proposed beamformers perform well compared to traditional methods, including a state-of-the-art dereverberation method, particularly in adverse conditions with high amounts of noise and reverberation.

[1]  Jesper Rindom Jensen,et al.  Fast joint DOA and pitch estimation using a broadband MVDR beamformer , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[2]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[3]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  Mads Græsbøll Christensen,et al.  Accurate Estimation of Low Fundamental Frequencies From Real-Valued Measurements , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jian Li,et al.  An adaptive filtering approach to spectral estimation and SAR imaging , 1996, IEEE Trans. Signal Process..

[6]  Sharon Gannot,et al.  Sensitivity analysis of MVDR and MPDR beamformers , 2010, 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel.

[7]  Jacob Benesty,et al.  Noise Reduction with Optimal Variable Span Linear Filters , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[9]  Jingdong Chen,et al.  Microphone Array Signal Processing , 2008 .

[10]  Søren Holdt Jensen,et al.  The single- and multichannel audio recordings database (SMARD) , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[11]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[12]  R. Lacoss DATA ADAPTIVE SPECTRAL ANALYSIS METHODS , 1971 .

[13]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[14]  Søren Holdt Jensen,et al.  Nonlinear Least Squares Methods for Joint DOA and Pitch Estimation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[16]  Walter Kellermann,et al.  A Novel Ego-Noise Suppression Algorithm for Acoustic Signal Enhancement in Autonomous Systems , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Shlomo Dubnov,et al.  Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[19]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Jacob Benesty,et al.  Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[22]  Jacob Benesty,et al.  Study of the Wiener Filter for Noise Reduction , 2005 .

[23]  Yi Hu,et al.  A subspace approach for enhancing speech corrupted by colored noise , 2002, IEEE Signal Processing Letters.

[24]  Emanuel A. P. Habets,et al.  Models, Measurement and Evaluation , 2010, Speech Dereverberation.

[25]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[26]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..

[28]  Jesper Rindom Jensen,et al.  Computationally Efficient and Noise Robust DOA and Pitch Estimation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Søren Holdt Jensen,et al.  Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient , 2017, Signal Process..

[30]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[31]  H. Cox Resolving power and sensitivity to mismatch of optimum array processors , 1973 .

[32]  Andreas Jakobsson,et al.  Optimal Filter Designs for Separating and Enhancing Periodic Signals , 2010, IEEE Transactions on Signal Processing.

[33]  B. Carlson Covariance matrix estimation errors and diagonal loading in adaptive arrays , 1988 .

[34]  Jacob Benesty,et al.  A Study of the LCMV and MVDR Noise Reduction Filters , 2010, IEEE Transactions on Signal Processing.

[36]  Brendan Harvey,et al.  A harmonic spectral beamformer for the enhanced localization of propeller-driven aircraft , 2019 .

[37]  I. Cohen,et al.  Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[38]  Toon van Waterschoot,et al.  Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  J. Benesty,et al.  Multichannel Noise Reduction in the Karhunen-Loève Expansion Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[41]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[42]  Andrea Cavallaro,et al.  Microphone-Array Ego-Noise Reduction Algorithms for Auditory Micro Aerial Vehicles , 2017, IEEE Sensors Journal.

[43]  Antonio Cantoni,et al.  Derivative constraints for broad-band element space antenna array processors , 1983 .

[44]  Ali Taylan Cemgil,et al.  Bayesian Model Comparison With the g-Prior , 2014, IEEE Transactions on Signal Processing.

[45]  Jacob Benesty,et al.  Speech Enhancement in the Karhunen-Loève Expansion Domain , 2011, Speech Enhancement in the Karhunen-Loève Expansion Domain.

[46]  Andreas Jakobsson,et al.  Harmonic minimum mean squared error filters for multichannel speech enhancement , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Simona Ronchi Della Rocca,et al.  λ Δ -Models , 2004 .

[48]  Tomohiro Nakatani,et al.  Single-Microphone Blind Dereverberation , 2005 .