Investigations on spatial sound design based on measured room impulse responses

Developments in the area of spatial sound reproduction have led to a large variety of established audio systems. Systems based on stereophonic principles are extended and growing from two channels via the ITU-R BS.775 surround setup to larger systems with more channels including elevated loudspeaker. On the other hand, sound field reproduction systems aiming to reconstruct an acoustic field like Wave Field Synthesis (WFS) and Ambisonics, are on the verge of being available on the market. Additionally, binaural reproduction is established especially for simulation and auralization applications and psychoacoustic research and is now entering the mass market by the success of smart phones and other devices. All these system are termed as spatial audio reproduction systems. Using spatial reproduction systems only very few applications are aiming for a natural reproduction of a recorded situation. In most cases the aim is to communicate artistic messages or ideas: • A recorded music performance transformed to a spatial reproduction by sound engineers. • A pure virtual piece of music (e.g., pop music produced in a studio or electronic music). • A virtual piece of acoustic art (e.g., radio drama). • An audio-visual artwork (e.g., a movie and its corresponding sound track). Most applications do not reproduce a real acoustic environment. The spatial audio scene is a pure virtual construct. The development and realization of such a scene is termed sound design. The sound designer tries to communicate an acoustical idea and needs to transform his abstract concept into acoustic reality in a given environment with a given reproduction system. Such a concept of sound is not necessarily described by the use of physical models in terms of geometrical room models with an arrangement of real sound sources. Furthermore such an acoustic idea does not and should not depend on a particular reproduction system. During the last decades of audio signal processing development, a countless number of tools for the modification of single audio streams have been developed (e.g., equalizer, compressor, modulation effects like chorus). All these tools can be used to modify a property of an single audio stream. The sound designer transforms his acoustic idea into a parameter set for processing devices to reach his goals of acoustic communication. Besides the artistic knowledge a strong background in signal processing and the interaction of both is required. Especially, the perceptual effect of a modification of a property of an audio stream is the key element in the know-how of a sound engineer, sound designer or Tonmeister. In addition, the process of spatial sound design modifies the spatial properties of an audio stream including its position, direction, orientation in a virtual room and the acoustic characteristics or the room itself. For the sound design process two possible models are: 1. The virtual acoustic scene is referred to as an object oriented virtual reality composed of sound objects and sound manipulating objects (e.g. walls). An acoustic environment as it can be found in the physical world is modeled. 2. The virtual acoustic scene and corresponding acoustic field are visualized (direction dependent) in terms of direction dependent perceptual or physical properties but without a representation of physical possible objects or sound sources. The first model limits the sound designer by physical constraints, which are part of the scene description and have to be implemented on basis of simulations. Moreover the sound designer has to adapt his acoustic idea to simulated objects. A more intuitive way to modify the sound field is a direct interaction with a graphical representation as given in the second approach. The main objective of this thesis is to develop a spatial sound design system, which is not bounded by descriptions of physical and geometrical room acoustics and of which the interaction principles are reproduction system independent. The focus of this work lies on the sound design in terms of acoustic environments, but the introduced principles can easily be extended. It is important to notice that the basic material is the analysis of existing room or sound field. Using the principles of spatial sound design these can be modified. At this point a geometrical or physical correspondence is not required. In this work a novel processing chain for spatial sound design based on measured room impulse responses has been developed. The system is based on spherical array measurements of room impulse responses. New interaction methods for sound designers have been developed based on such measurements. The interaction principles developed are independent of specific reproduction systems and based on direct interaction with visualizations of spatial impulse responses. Following this processing chain each building block has been analyzed in detail. In the acquisition of spatial impulse responses the properties of cardioid open sphere virtual microphone arrays were investigated in terms of error robustness and spatial sampling. Virtual means in this context that the impulse responses are measured consecutively with a single microphone on a robotic arm. The capabilities of extrapolation and plane wave decomposition of such a system were analyzed. The combination of simultaneous measurements on different radii to extend the usable frequency range was investigated and implemented. The analysis of measured room impulse responses has been described and a storage format for the analyzed spatial impulse response has been developed and used in the realized processing framework. The extraction of single events (e.g., reflections) from spherical array measurements was studied by measurements in an anechoic room equipped with a single reflecting surface. It was shown that with adequate spatio-temporal filtering the frequency response of a reflection can be extracted. A taxonomy of impulse response visualization was given in order to develop a reproduction system independent interaction method for spatial sound design. Suitable techniques for the direct interaction with different visualized direction dependent impulse responses were investigated and implemented. The interaction process was classified in static and dynamic interactions. The static interaction of the spatial sound design process modifies the parts of the room impression which are independent from a source position and a specific reproduction system. The dynamic interaction process is depending on the source and reproduction system configuration. For the static interaction new methods like inverse energy decay curve editing and the concept of shaping surfaces were developed and applied. Time-variant filtering, based on the short-time Fourier transform and the spatial envelope shaping are new principles studied in this context. The dynamic interaction was analyzed and methods for an efficient design of spatial acoustic scenes were developed. The interaction techniques that are proposed have been realized within a graphical user interface for desktop use. An extension of a new user interface has been described and prototypes have been realized using an augmented reality framework and state-of-the-art user interface hardware. The dynamic interaction was studied for wave field synthesis, stereophonic reproduction and binaural reproduction. Methods for the auralization and adaptation of measured and modified high resolution data to these three reproduction systems were developed. The effects of measurement errors on the reproduction quality were investigated using listening experiments employing simulated measurement data based on mirror image source models and diffuse field simulations. The advantages of dual radius cardioid spherical microphone arrays and the developed adaptation methods for stereo, binaural, and WFS reproduction have been demonstrated. The methods proposed and developed in this thesis are based on the measurement of room impulse responses. The next step is the application of the proposed methods to array recordings of complete performances. To make this possible a multichannel array recording and analysis system with very high spatial resolution has to be developed. Since the complexity and costs are very high for real time recording compared with the room impulse response measurements used in this thesis a model which optimizes such a system based on the human perception will be very important. The author believes that a tempo-spatial editing of recorded events, will be available in the future and extend the freedom of the sound design process in a unprecedented way. One can think of a simultaneous recording of a acoustic event where the spatial arrangement and the properties of the different sources can be changed and edited in the post production process without recording each source separately.