Post-Filter Optimization for Multichannel Automotive Speech Enhancement

In an automotive environment, quality of speech communication using a hands-free equipment is often deteriorated by interfering car noise. In order to preserve the speech signal without car noise, a multichannel speech enhancement system including a beamformer and a post-filter can be applied. Since employing a beamformer alone is insufficient to substantially reducing the level of car noise, a post-filter has to be applied to provide further noise reduction, especially at low frequencies. In this thesis, two novel post-filter designs along with their optimization for different driving conditions are presented. The first post-filter design utilizes an adaptive smoothing factor for the power spectral density estimation as well as a hybrid noise coherence function. The hybrid noise coherence function is a mixture of the diffuse and the measured noise coherence functions for a specific driving condition. The second post-filter design applies a new multichannel decisiondirected a priori SNR estimator based on both temporal and spatial smoothing. For different driving conditions, both post-filters are instrumentally optimized: For the first post-filter, the optimal adaptive smoothing factor and the optimal hybrid noise coherence function are obtained. For the second post-filter, the weighting factors of the temporal and spatial smoothing parts are optimized. Compared to state-of-the-art post-filters, both post-filter designs employing the optimized parameters improve the overall noise reduction performance significantly for different driving conditions. Generally, manually finding the optimal parameterization of a noise reduction algorithm is a time-consuming task. In this thesis, the two new post-filter designs are thus instrumentally optimized by using a figure of merit (FoM). We define the FoM as an entity, which comprises three independent instrumental measures for the speech component quality, the level of noise attenuation, and the amount of musical tones. Particularly, a new weighted log kurtosis ratio measure is proposed for instrumental musical tones assessment in a black-box test manner, which does not mandate any knowledge of internal variables of the noise reduction algorithm under test and can be applied to a wide range of noise reduction algorithms. Subjective listening tests reveal that the weighted log kurtosis ratio measurements can provide a high correlation to the perceived amount of musical tones. In addition, a single-channel application example of jointly optimizing the smoothing factor and the a priori SNR floor of the decision-directed a priori SNR estimation is shown using an FoM. For some noise reduction algorithms, yet unknown optimal values of the parameters of interest are identified by applying the FoM-based instrumental optimization method and subjectively verified.

[1]  Jörg Meyer,et al.  Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Huajun Yu,et al.  A post-filter for wideband speech beamforming in automotive application , 2010, IEEE International Symposium on Consumer Electronics (ISCE 2010).

[3]  Barry D. Van Veen Minimum variance beamforming with soft response constraints , 1991, IEEE Trans. Signal Process..

[4]  T. Fingscheidt,et al.  Head-Unit Integrated Microphone Array for Handsfree Wideband Telephony , 2010 .

[5]  M. de Rijke,et al.  An effective coherence measure to determine topical consistency in user-generated content , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[6]  NTTアドバンステクノロジ株式会社 Multi-lingual speech database for telephnometry 1994 , 1994 .

[7]  M. J. Jacobson Space-Time Correlation in Spherical and Circular Noise Fields , 1962 .

[8]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[9]  Jont B. Allen,et al.  Multimicrophone signal‐processing technique to remove room reverberation from speech signals , 1977 .

[10]  Rainer Martin,et al.  Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Kiyohiro Shikano,et al.  Theoretical analysis of musical noise in Wiener filtering family via higher-order statistics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Michael S. Brandstein On the use of explicit speech modeling in microphone array applications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[15]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[16]  Kiyohiro Shikano,et al.  Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Henrique S. Malvar,et al.  A new beamformer design algorithm for microphone arrays , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Kiyohiro Shikano,et al.  Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics , 2008 .

[20]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[21]  Bernard Widrow,et al.  Signal cancellation phenomena in adaptive antennas: Causes and cures , 1982 .

[22]  Tim Fingscheidt,et al.  A Beamformer Post-Filter with Hybrid Noise Coherence Functions Instrumentally Optimized Using a Figure of Merit , 2012, ITG Conference on Speech Communication.

[23]  Ea-Ee Jan,et al.  Spatially selective sound capture for speech and audio processing , 1993, Speech Commun..

[24]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[25]  Jan Wouters,et al.  Robustness analysis of GSVD based optimal filtering and generalized sidelobe canceller for hearing aid applications , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[26]  Jacob Benesty,et al.  Joint dereverberation and noise reduction using a two-stage beamforming approach , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[27]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[28]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[29]  A. Steele Comparison of directional and derivative constraints for beamformers subject to multiple linear constraints , 1983 .

[30]  Thushara D. Abhayapala,et al.  Alternatives to spherical microphone arrays: Hybrid geometries , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  E. Gilbert,et al.  Optimum design of directive antenna arrays subject to random variations , 1955 .

[32]  Michael S. Brandstein An event-based method for microphone array speech enhancement , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[33]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[34]  Henry Cox,et al.  Robust adaptive beamforming , 2005, IEEE Trans. Acoust. Speech Signal Process..

[35]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[36]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[37]  Karl-Dirk Kammeyer,et al.  Broadband beamforming with adaptive postfiltering for speech acquisition in noisy environments , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[39]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[40]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..

[41]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[42]  Régine Le Bouquin-Jeannès,et al.  A Two-Sensor Noise Reduction System: Applications for Hands-Free Car Kit , 2003, EURASIP J. Adv. Signal Process..

[43]  Hiroshi Mizoguchi,et al.  Circular microphone array for meeting system , 2003, Proceedings of IEEE Sensors 2003 (IEEE Cat. No.03CH37498).

[44]  Karl-Dirk Kammeyer,et al.  Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[45]  Akihiko Sugiyama,et al.  A robust adaptive microphone array with improved spatial selectivity and its evaluation in a real environment , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Boaz Rafaely,et al.  Analysis and design of spherical microphone arrays , 2005, IEEE Transactions on Speech and Audio Processing.

[47]  Michael S. Brandstein,et al.  A closed-form location estimator for use with room environment microphone arrays , 1997, IEEE Trans. Speech Audio Process..

[48]  A. Wasiljeff,et al.  Adaptive Microphone Arrays for Noise Suppression in the Frequency Domain , 1992 .

[49]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[50]  Akihiko Sugiyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1999, IEEE Trans. Signal Process..

[51]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[52]  Tim Fingscheidt,et al.  A Weighted Log Kurtosis Ratio Measure for Instrumental Musical Tones Assessment in Wideband Speech , 2012, ITG Conference on Speech Communication.

[53]  Yutaka Kaneda Adaptive microphone array system for noise reduction (AMNOR) and its performance studies , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[54]  C. H. Sherman,et al.  Spatial‐Correlation Functions for Various Noise Models , 1962 .

[55]  Gary W. Elko,et al.  Spherical harmonic modal beamforming for an augmented circular microphone array , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[57]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[58]  K. Genuit OBJECTIVE EVALUATION OF ACOUSTIC QUALITY BASED ON A RELATIVE APPROACH , 2004 .

[59]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[60]  Heinrich Kuttruff,et al.  Room Acoustics, Fourth Edition , 2000 .

[61]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[62]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[63]  C. Eckart The Theory of Noise in Continuous Media , 1953 .

[64]  Karl-Dirk Kammeyer,et al.  Multichannel noise reduction — Algorithms and theoretical limits , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[65]  I. Claesson,et al.  Acoustic noise and echo cancelling with microphone array , 1999 .

[66]  I. Cohen,et al.  Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[67]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[68]  Walter Kellermann A self-steering digital microphone array , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[69]  Tim Fingscheidt,et al.  A data-driven post-filter design based on spatially and temporally smoothed a priori SNR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Klaus Genuit,et al.  Models of signal processing in human hearing , 2005 .

[71]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[72]  B. Breed,et al.  A short proof of the equivalence of LCMV and GSC beamforming , 2002, IEEE Signal Processing Letters.

[73]  Gerhard Schmidt,et al.  A compact microphone array system with spatial post-filtering for automotive applications , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[75]  E. Habets,et al.  Generating sensor signals in isotropic noise fields. , 2007, The Journal of the Acoustical Society of America.

[76]  Tim Fingscheidt,et al.  Black box measurement of musical tones produced by noise reduction systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[78]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[79]  Petros Maragos,et al.  Optimum post-filter estimation for noise reduction in multichannel speech processing , 2006, 2006 14th European Signal Processing Conference.

[80]  R. K. Cook,et al.  Measurement of Correlation Coefficients in Reverberant Sound Fields , 1955 .

[81]  A. Treloar Handbook of Probability and Statistics with Tables , 1954 .

[82]  Tim Fingscheidt,et al.  A New Hybrid Post-Filter using a Multichannel Decision-Directed Approach for A Priori SNR Estimation , 2010, Sprachkommunikation.

[83]  Antonio Cantoni,et al.  An alternative formulation for an optimum beamformer with robustness capability , 1985 .

[84]  Vishu R. Viswanathan,et al.  Hands-free voice communication in an automobile with a microphone array , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[86]  Robert J. Inkol,et al.  The use of linear constraints to reduce the variance of time of arrival difference estimates for source location , 1983, ICASSP '83. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[88]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .