Generalized Spherical Array Beamforming for Binaural Speech Reproduction

Microphone arrays are used in speech signal processing applications such as teleconferencing and telepresence, in order to enhance a desired speech signal in the presence of speech signals from other speakers, reverberation and background noise. These arrays usually provide a single-channel output, so that no spatial information is available in the output signal. However, spatial information on the sound sources may increase the intelligibility of a speech signal perceived by a human listener. This work presents a mathematical framework for generalized spherical array beamforming that in addition to suppressing noise and reverberation, is aiming to preserve spatial information on the sources in the recording venue. The generalized beamforming, formulated in the spherical harmonics domain, is based on binaural sound reproduction where the head-related transfer functions are incorporated into a headphones presentation. The performance of the proposed generalized beamformer is compared to that of a single-channel output maximum-directivity beamformer. Listening tests with human subjects show that when the generalized beamformer is used the intelligibility is improved at low input SNRs.

[1]  N. J. A. Sloane,et al.  McLaren’s improved snub cube and other new spherical designs in three dimensions , 1996, Discret. Comput. Geom..

[2]  S. Meshkov,et al.  Theory of Complex Spectra , 1953 .

[3]  Gary W. Elko,et al.  A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Marc Moonen,et al.  Robust Distributed Noise Reduction in Hearing Aids with External Acoustic Sensor Nodes , 2009, EURASIP J. Adv. Signal Process..

[5]  Marc Moonen,et al.  Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Douglas S. Brungart,et al.  Spatial release from speech-on-speech masking in the median sagittal plane. , 2012, The Journal of the Acoustical Society of America.

[7]  B. Rafaely,et al.  Spherical array beamforming for binaural sound reproduction , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[8]  Boaz Rafaely,et al.  Analysis and design of spherical microphone arrays , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Boaz Rafaely,et al.  Spatial Aliasing in Spherical Microphone Arrays , 2007, IEEE Transactions on Signal Processing.

[10]  J M Kates,et al.  Speech intelligibility enhancement using hearing-aid array processing. , 1997, The Journal of the Acoustical Society of America.

[11]  G. Shortley,et al.  The Theory of Complex Spectra II , 2022 .

[12]  Gerald Kidd,et al.  Tuning in the spatial dimension: evidence from a masked speech identification task. , 2008, The Journal of the Acoustical Society of America.

[13]  Marc Moonen,et al.  Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating , 2010, IEEE Transactions on Signal Processing.

[14]  Ramani Duraiswami,et al.  Plane-Wave Decomposition of Acoustical Scenes Via Spherical and Cylindrical Microphone Arrays , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Henrik Møller Fundamentals of binaural technology , 1991 .

[16]  Marc Moonen,et al.  The effect of multimicrophone noise reduction systems on sound source localization by users of binaural hearing aids. , 2008, The Journal of the Acoustical Society of America.

[17]  Y. Peled,et al.  Study of speech intelligibility in noisy enclosures using optimal spherical beamforming , 2008, 2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel.

[18]  Thushara D. Abhayapala,et al.  Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Barbara Shinn-Cunningham,et al.  Spatial release from energetic and informational masking in a selective speech identification task. , 2008, The Journal of the Acoustical Society of America.

[20]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[21]  E. Williams,et al.  Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography , 1999 .

[22]  Young-Cheol Park,et al.  Techniques for Synthetic Reconfiguration of Microphone Arrays , 2011 .

[23]  W. T. Nelson,et al.  A speech corpus for multitalker communications research. , 2000, The Journal of the Acoustical Society of America.

[24]  G. Racah,et al.  Theory of Complex Spectra. IV , 1942 .

[25]  Walter Kellermann,et al.  Computationally efficient frequency-domain combination of acoustic echo cancellation and robust adaptive beamforming , 2001, INTERSPEECH.

[26]  Ruth Y. Litovsky,et al.  Spatial Release from Masking , 2012 .

[27]  W. M. Rabinowitz,et al.  Auditory localization of nearby sources. Head-related transfer functions. , 1999, The Journal of the Acoustical Society of America.

[28]  Jacob Benesty,et al.  The MVDR Beamformer for Speech Enhancement , 2010 .

[29]  Boaz Rafaely,et al.  Interaural cross correlation in a sound field represented by spherical harmonics. , 2009, The Journal of the Acoustical Society of America.

[30]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[31]  M. E. Rose,et al.  Elementary Theory of Angular Momentum , 1957 .

[32]  Ruth Y Litovsky,et al.  A cocktail party model of spatial release from masking by both noise and speech interferers. , 2011, The Journal of the Acoustical Society of America.

[33]  B. Rafaely Plane-wave decomposition of the sound field on a sphere by spherical convolution , 2004 .

[34]  Walter Kellermann,et al.  A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments , 2006, Signal Process..

[35]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[36]  Boaz Rafaely,et al.  Binaural sound reproduction beamforming using spherical microphone arrays , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  P. Stoica,et al.  Robust Adaptive Beamforming , 2013 .

[38]  Boaz Rafaely,et al.  Spherical array processing with binaural sound reproduction for improved speech intelligibility , 2013 .

[39]  Thushara D. Abhayapala,et al.  Spatial Multizone Soundfield Reproduction: Theory and Design , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[41]  C.L. Dolph,et al.  A Current Distribution for Broadside Arrays Which Optimizes the Relationship between Beam Width and Side-Lobe Level , 1946, Proceedings of the IRE.

[42]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[43]  Marc Moonen,et al.  Binaural Noise Reduction Algorithms for Hearing Aids That Preserve Interaural Time Delay Cues , 2007, IEEE Transactions on Signal Processing.

[44]  W. W. Hansen,et al.  A New Principle in Directional Antenna Design , 1938, Proceedings of the Institute of Radio Engineers.

[45]  Boaz Rafaely,et al.  Method for dereverberation and noise reduction using spherical microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Thomas Esch,et al.  Model-Based Dereverberation Preserving Binaural Cues , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  H S Colburn,et al.  Lateral position and interaural discrimination. , 1977, The Journal of the Acoustical Society of America.

[48]  Walter Kellermann,et al.  TRINICON for Dereverberation of Speech and Audio Signals , 2010, Speech Dereverberation.

[49]  Irwin Pollack,et al.  Stereophonic Listening and Speech Intelligibility against Voice Babble , 1958 .

[50]  Walter Kellermann,et al.  Orthogonalized power filters for nonlinear acoustic echo cancellation , 2006, Signal Process..

[51]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[52]  M. E. Rose Elementary Theory of Angular Momentum , 1957 .

[53]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[54]  W. J. Thompson,et al.  Angular Momentum: An Illustrated Guide to Rotational Symmetries for Physical Systems , 1994 .

[55]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[56]  John Mourjopoulos,et al.  Binaural extension and performance of single-channel spectral subtraction dereverberation algorithms , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).