Multizone Speech Reinforcement

In this article, we address speech reinforcement (near-end listening enhancement) for a scenario where there are several playback zones. In such a framework, signals from one zone can leak into other zones (crosstalk), causing intelligibility and/or quality degradation. An optimization framework is built by exploring a signal model where effects of noise, reverberation and zone crosstalk are taken into account simultaneously. Through the symbolic usage of a general smooth distortion measure, necessary optimality conditions are derived in terms of distortion measure gradients and the signal model. Subsequently, as an illustrative example of the framework, the conditions are applied for the mean-square error (MSE) expected distortion under a hybrid stochastic-deterministic model for the corruptions. A crosstalk cancellation algorithm follows, which depends on diffuse reverberation and across zone direct path components. Simulations validate the optimality of the algorithm and show a clear benefit in multizone processing, as opposed to the iterated application of a single-zone algorithm. Also, comparisons with least-squares crosstalk cancellers in literature show the profit of using a hybrid model.

[1]  Keisuke Kinoshita,et al.  Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[2]  Jesper Jensen,et al.  A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration , 2005, EURASIP J. Adv. Signal Process..

[3]  J. Conway,et al.  Functions of a Complex Variable , 1964 .

[4]  Sungjin Park,et al.  Speech Intelligibility Enhancement using Tunable Equalization Filter , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[6]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[7]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[8]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[9]  Peter Vary,et al.  Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement , 2010, Sprachkommunikation.

[10]  Richard C. Hendriks,et al.  Multizone near-end speech enhancement under optimal second-order magnitude distortion , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[11]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[12]  Philipos C. Loizou Speech Enhancement (Signal Processing and Communications) , 2007 .

[13]  J. H. van Lint,et al.  Functions of one complex variable II , 1997 .

[14]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[15]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[16]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[17]  T. Langhans,et al.  Speech enhancement by nonlinear multiband envelope filtering , 1982, ICASSP.

[18]  Richard Heusdens,et al.  A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  J D Griffiths Optimum linear filter for speech transmission. , 1968, The Journal of the Acoustical Society of America.

[21]  Ching-Chung Li,et al.  Speech signal modification to increase intelligibility in noisy environments. , 2007, The Journal of the Acoustical Society of America.

[22]  J. Polack La transmission de l'energie sonore dans les salles , 1988 .

[23]  Radoslaw Mazur,et al.  Combined Acoustic MIMO Channel Crosstalk Cancellation and Room Impulse Response Reshaping , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Hareo Hamada,et al.  Fast deconvolution of multichannel systems using regularization , 1998, IEEE Trans. Speech Audio Process..

[25]  Nam Soo Kim,et al.  Perceptual Reinforcement of Speech Signal Based on Partial Specific Loudness , 2007, IEEE Signal Processing Letters.

[26]  Alfred Mertins,et al.  Room Impulse Response Shortening/Reshaping With Infinity- and $p$ -Norm Optimization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Peter Vary,et al.  Near end listening enhancement optimized with respect to Speech Intelligibility Index , 2009, 2009 17th European Signal Processing Conference.

[28]  Gary W. Elko,et al.  Effect of loudspeaker position on the robustness of acoustic crosstalk cancellation , 1999, IEEE Signal Processing Letters.

[29]  Radoslaw Mazur,et al.  Robust combined crosstalk cancellation and listening-room compensation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[30]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[31]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[32]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[33]  Young-Cheol Park,et al.  A Minimax Approach for the Joint Design of Acoustic Crosstalk Cancellation Filters , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Darren B. Ward Joint least squares optimization for robust acoustic crosstalk cancellation , 2000, IEEE Trans. Speech Audio Process..

[35]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[36]  Alfred Mertins,et al.  A Spatially Robust Least Squares Crosstalk Canceller , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[37]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[38]  Jesper Jensen,et al.  On Optimal Linear Filtering of Speech for Near-End Listening Enhancement , 2013, IEEE Signal Processing Letters.

[39]  J. L. Hall,et al.  Intelligibility and listener preference of telephone speech in the presence of babble noise. , 2010, The Journal of the Acoustical Society of America.

[40]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[41]  M. Herrero Botín [Language and communication]. , 1984, Revista de enfermeria.

[42]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[43]  Russell J. Niederjohn,et al.  The Intelligibility of Filtered-Clipped Speech in Noise , 1970 .

[44]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[45]  Richard Heusdens,et al.  A Low-Complexity Spectro-Temporal Distortion Measure for Audio Processing Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Peter Vary,et al.  NEAR END LISTENING ENHANCEMENT WITH STRICT LOUDSPEAKER OUTPUT POWER CONSTRAINING , 2006 .

[47]  A. Mertins,et al.  Room Impulse Response Shortening by Channel Shortening Concepts , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[48]  John G. Harris,et al.  Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments , 2006, Speech Commun..

[49]  Gary W. Elko,et al.  Virtual sound using loudspeakers: robust acoustic crosstalk cancellation , 2000 .

[50]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.