Improving speech privacy in personal sound zones

This paper proposes two methods for providing speech privacy between spatial zones in anechoic and reverberant environments. The methods are based on masking the content leaked between regions. The masking is optimised to maximise the speech intelligibility contrast (SIC) between the zones. The first method uses a uniform masker signal that is combined with desired multizone loudspeaker signals and requires acoustic contrast between zones. The second method computes a space-time domain masker signal in parallel with the loudspeaker signals so that the combination of the two emphasises the spectral masking in the targeted quiet zone. Simulations show that it is possible to achieve a significant SIC in anechoic environments whilst maintaining speech quality in the bright zone.

[1]  Thushara D. Abhayapala,et al.  Personal Sound Zones: Delivering interface-free audio to multiple listeners , 2015, IEEE Signal Processing Magazine.

[2]  Terence Betlehem,et al.  A constrained optimization approach for multi-zone surround sound , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  E. Williams,et al.  Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography , 1999 .

[4]  Mark A. Poletti,et al.  An Investigation of 2-D Multizone Surround Sound Systems , 2008 .

[5]  Thushara D. Abhayapala,et al.  Spatial Multizone Soundfield Reproduction: Theory and Design , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  John S. Bradley,et al.  ASTM METRICS FOR RATING SPEECH PRIVACY OF CLOSED ROOMS AND OPEN PLAN SPACES , 2011 .

[7]  Richard C. Hendriks,et al.  Optimizing Speech Intelligibility in a Noisy Environment: A unified view , 2015, IEEE Signal Processing Magazine.

[8]  Thushara D. Abhayapala,et al.  Theory and Design of Soundfield Reproduction Using Continuous Loudspeaker Concept , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[10]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Christian Ritz,et al.  Multizone reproduction of speech soundfields: A perceptually weighted approach , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[12]  W. Bastiaan Kleijn,et al.  Theory and Design of Multizone Soundfield Reproduction Using Sparse Methods , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Thushara D. Abhayapala,et al.  Enhanced sound field reproduction within prioritized control region , 2014 .

[14]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[16]  Christian Ritz,et al.  An efficient approach to dynamically weighted multizone wideband reproduction of speech soundfields , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[17]  W. Bastiaan Kleijn,et al.  Multizone soundfield reproduction using orthogonal basis expansion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[19]  P. Jackson,et al.  Personal audio with a planar bright zone. , 2014, The Journal of the Acoustical Society of America.

[20]  Ian S. Burnett,et al.  Generation of Isolated Wideband Sound Fields Using a Combined Two-stage Lasso-LS Algorithm , 2013, IEEE Transactions on Audio, Speech, and Language Processing.