Microphone front-ends for spatial sound analysis and synthesis with Directional Audio Coding

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Jukka Ahonen Name of the doctoral dissertation Microphone front-ends for spatial sound analysis and synthesis with Directional Audio Coding Publisher School of Electrical Engineering Unit Department of Signal Processing and Acoustics Series Aalto University publication series DOCTORAL DISSERTATIONS 33/2013 Field of research Acoustics and Audio Signal Processing Manuscript submitted 17 September 2012 Date of the defence 8 March 2013 Permission to publish granted (date) 14 December 2012 Language English Monograph Article dissertation (summary + original articles) Abstract A large number of professional and domestic audio applications utilize spatial sound reproduction. In addition to the conventional applications, such as the surround sound in movie and home theaters, spatial sound is also applied for telecommunication purposes. For instance in teleconferencing, sound emanated by talkers can be captured with multiple microphones at one end and reproduced spatially distributed with multiple loudspeakers at the other. This has benefit over a typical monophonic reproduction of the teleconference in terms of speech intelligibility and other elements of communication.A large number of professional and domestic audio applications utilize spatial sound reproduction. In addition to the conventional applications, such as the surround sound in movie and home theaters, spatial sound is also applied for telecommunication purposes. For instance in teleconferencing, sound emanated by talkers can be captured with multiple microphones at one end and reproduced spatially distributed with multiple loudspeakers at the other. This has benefit over a typical monophonic reproduction of the teleconference in terms of speech intelligibility and other elements of communication. During the last decade there has been an increasing research interest in parametric spatial sound processing. Several techniques for estimating the directional parameters of a sound field from multichannel audio files or from microphone signals have been proposed. In the parametric techniques, the directional information can be efficiently transmitted and then applied to spatial sound synthesis for various purposes. This thesis discusses Directional Audio Coding (DirAC) for capturing, transmitting and reproducing spatial sound. The perceptually motivated time-frequency processing of DirAC provides a parametric description of spatial sound, namely the arrival direction and diffuseness of sound. Direction and diffuseness, when analyzed in the time-frequency resolution of human hearing, are assumed to transmit enough information on the captured sound field for spatial hearing. DirAC has several applications of spatial audio, of which teleconferencing is mainly the focus here. The author's research addresses the development of different microphone front-ends for DirAC. The methods to analyze a sound field with input from arrays of omnidirectional microphones and from typical directional stereo microphones were studied. A novel method for diffuseness estimation was developed as a part of this work. Microphone arrays, which exploit an acoustic shadowing between microphones, are also proposed as an acoustical frontend for DirAC, as are the methods to conduct directional analysis with such arrays. These methods overcome the issues, which occur in direction analysis with input from the conventional microphone arrays, and thus provide reliable direction estimate over the entire audio frequency range. In the thesis, DirAC processing is also applied to bilaterally-fitted hearing aids with two microphones at each ear. The use of different microphone front-ends is evaluated through measurements and listening tests.

[1]  Thushara D. Abhayapala,et al.  Theory and design of high order sound field microphones using spherical microphone array , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  B. Bauer,et al.  Phasor analysis of some stereophonic phenomena , 1962 .

[3]  D. D. Greenwood,et al.  Auditory Masking and the Critical Band , 1961 .

[4]  Michael M. Goodwin,et al.  Analysis and Synthesis for Universal Spatial Audio Coding , 2006 .

[5]  Daniel J. Tollin,et al.  The Precedence Effect in Sound Localization , 2015, Journal of the Association for Research in Otolaryngology.

[6]  Benjamin Bernfeld,et al.  Attempts for Better Understanding of the Directional Stereophonic Listening Mechanism , 1973 .

[7]  Jukka Ahonen,et al.  Directional Audio Coding with Stereo Microphone Input , 2009 .

[8]  Tapio Lokki,et al.  Teleconference Application and B-Format Microphone Array for Directional Audio Coding , 2007 .

[9]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[10]  Ville Pulkki,et al.  Virtual Sound Source Positioning Using Vector Base Amplitude Panning , 1997 .

[11]  Stephan Paul,et al.  Binaural Recording Technology: A Historical Review and Possible Future Developments , 2009 .

[12]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[13]  Christof Faller,et al.  Binaural cue coding-Part I: psychoacoustic fundamentals and design principles , 2003, IEEE Trans. Speech Audio Process..

[14]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[15]  Akira Fukada,et al.  A challenge in multichannel music recording , 2001 .

[16]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[17]  Henrik Møller Fundamentals of binaural technology , 1991 .

[18]  Christof Faller,et al.  Converting Stereo Microphone Signals Directly to MPEG-Surround , 2010 .

[19]  Michael J. Gerzon Periphony: With-Height Sound Reproduction , 1973 .

[20]  Wesley L. Dooley,et al.  M-S Stereo: A Powerful Technique for Working in Stereo , 1981 .

[21]  Henrik Møller,et al.  Transfer characteristics of headphones measured on human ears , 1995 .

[22]  M. Ericson,et al.  The Intelligibility of Multiple Talkers Separated Spatially in Noise , 2001 .

[23]  V. Ralph Algazi,et al.  Dependence of subject and measurement position in binaural signal acquisition , 1999 .

[24]  Durand R. Begault Virtual Acoustic Displays for Teleconferencing: Intelligibility Advantage for 'Telephone-Grade' Audio , 1999 .

[25]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[26]  Juha Merimaa,et al.  Spatial Impulse Response Rendering II: Reproduction of Diffuse Sound and Listening Tests , 2006 .

[27]  Xavier Maitre,et al.  7 kHz audio coding within 64 kbit/s , 1988, IEEE J. Sel. Areas Commun..

[28]  Jessica J. Baldis Effects of spatial audio on memory, comprehension, and preference during desktop conferences , 2001, CHI.

[29]  G. Henning Detectability of interaural delay in high-frequency complex waveforms. , 1974, The Journal of the Acoustical Society of America.

[30]  Matti Karjalainen,et al.  Real-Time Tracking of Speech Sources Using Binaural Audio and Orientation Tracking , 2010 .

[31]  L. Rayleigh,et al.  XII. On our perception of sound direction , 1907 .

[32]  Juha Merimaa,et al.  Measurement, Analysis, and Visualization of Directional Room Responses , 2001 .

[33]  Teknillinen Korkeakoulu,et al.  Binaural to Multichannel Audio Upmix , 2005 .

[34]  Ron Streicher,et al.  Basic Stereo Microphone Perspectives: A Review , 1984 .

[35]  Walter Kellermann,et al.  Acoustic Echo Cancellation for Surround Sound using Perceptually Motivated Convergence Enhancement , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[36]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[37]  Svein Berge,et al.  HIGH ANGULAR RESOLUTION PLANEWAVE EXPANSION , 2010 .

[38]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[39]  Ville Pulkki,et al.  Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference , 2009 .

[40]  Juha Merimaa,et al.  Applications of a 3-D Microphone Array , 2002 .

[41]  Manfred Hibbing XY and MS Microphone Techniques in Comparison , 1989 .

[42]  Keith Barker,et al.  A New Approach to the Assessment of Stereophonic Sound System Performance , 1985 .

[43]  Michael Friis Sørensen,et al.  Head-Related Transfer Functions of Human Subjects , 1995 .

[44]  Tapio Lokki,et al.  Augmented reality audio for mobile and wearable appliances , 2004 .

[45]  D R Perrott,et al.  Discrimination of the spatial distribution of concurrently active sound sources: some experiments with stereophonic arrays. , 1984, The Journal of the Acoustical Society of America.

[46]  Christof Faller Microphone Front-Ends for Spatial Audio Coders , 2008 .

[47]  Jürgen Herre,et al.  MPEG Surround – the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding , 2007 .

[48]  Giovanni Del Galdo,et al.  Spatial filtering using directional audio coding parameters , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Eric Benjamin,et al.  The Native B-Format Microphone: Part II , 2006 .

[50]  A. Berkhout,et al.  Acoustic control by wave field synthesis , 1993 .

[51]  Maximo Cobos,et al.  On the Use of Small Microphone Arrays for Wave Field Synthesis Auralization , 2012 .

[52]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[53]  Audun Solvang Spectral Impairment for Two-Dimensional Higher Order Ambisonics , 2008 .

[54]  Helmut Haas,et al.  The Influence of a Single Echo on the Audibility of Speech , 1972 .

[55]  Tapio Lokki,et al.  Directional Audio Coding: Virtual Microphone-Based Synthesis and Subjective Evaluation , 2009 .

[56]  Christof Faller,et al.  PARAMETRIC CODING OF SPATIAL AUDIO , 2004 .

[57]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[58]  Natasha Barrett,et al.  A New Method for B-Format to Binaural Transcoding , 2010 .

[59]  Maximo Cobos,et al.  A Sparsity-Based Approach to 3D Binaural Sound Synthesis Using Time-Frequency Array Processing , 2010, EURASIP J. Adv. Signal Process..

[60]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[61]  Werner P. J. de Bruijn,et al.  Improving Speech Intelligibility in Teleconferencing by using Wave Field Synthesis , 2003 .

[62]  Christof Faller,et al.  Binaural cue coding-Part II: Schemes and applications , 2003, IEEE Trans. Speech Audio Process..

[63]  Ville Pulkki,et al.  Spatial sound generation and perception by amplitude panning techniques , 2001 .

[64]  T N Buell,et al.  Discrimination of interaural differences of time in the envelopes of high-frequency signals: integration times. , 1988, The Journal of the Acoustical Society of America.

[65]  Juha Merimaa,et al.  Analysis, synthesis, and perception of spatial sound : binaural localization modeling and multichannel loudspeaker reproduction , 2006 .

[66]  S. S. Stevens,et al.  Critical Band Width in Loudness Summation , 1957 .

[67]  Juha Merimaa,et al.  Spatial Impulse Response Rendering I: Analysis and Synthesis , 2005 .

[68]  Mikko-Ville Laitinen,et al.  Binaural reproduction for Directional Audio Coding , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[69]  Stanley P. Lipshitz,et al.  Stereo Microphone Techniques: Are the Purists Wrong? , 1985 .

[70]  Mikko-Ville Laitinen,et al.  Using Spaced Microphones with Directional Audio Coding , 2011 .