Increasing the environment-awareness of rake beamforming for directive acoustic sources

Speech signals captured by distant microphones in enclosures are typically deteriorated by reverberation and background noise. Commonly, the quality of the signals is enhanced applying delay and sum beamforming (or variants) to a microphone array. However, under particular conditions, the multi-path acoustic propagation leading to reverberation is not completely detrimental and can be used in a constructive way. In this direction, mirrored (virtual) microphones have been successfully applied in various research areas. In addition, the majority of naturally occurring sound sources, such as the human speaker, presents a certain degree of radiation directivity, which, coupled with data-independent beamforming, has been shown to slightly increase the captured speech quality. Building upon the concepts of environment awareness and the acoustic rake receiver, this paper investigates the use of mirrored microphones, associated to isolated and strong reflections, in combination with source directivity, to further improve the captured speech quality. Real-data gathered with a linear nested array, as well as simulated data, are used to test the proposed scheme, showing superior performance with respect to similar state of the art solutions.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Martin Vetterli,et al.  Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[4]  Ivan Dokmanic,et al.  Raking the Cocktail Party , 2015, IEEE Journal of Selected Topics in Signal Processing.

[5]  Emanuel A. P. Habets,et al.  The SCENIC Project: Space-Time Audio Processing for Environment-Aware Acoustic Sensingand Rendering , 2011 .

[6]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[7]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Terence Betlehem,et al.  Acoustic beamforming exploiting directionality of human speech sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[10]  Emanuel A. P. Habets,et al.  Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Alex Acero,et al.  Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Tao Zhang,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[14]  Pasi Pertilä,et al.  Microphone array post-filtering using supervised machine learning for speech enhancement , 2014, INTERSPEECH.

[15]  Emanuel A. P. Habets,et al.  Multi-Microphone Speech Dereverberation and Noise Reduction Using Relative Early Transfer Functions , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[17]  Kiyohiro Shikano,et al.  Speech enhancement by multiple beamforming with reflection signal equalization , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  Björn W. Schuller,et al.  Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Alessio Brutti,et al.  Environment aware estimation of the orientation of acoustic sources using a line array , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[20]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.