3D-AUDIO OBJECT ORIENTED CODING

This thesis first presents a novel object-oriented scheme which provides for ex- tensive description of time-varying 3D audio scenes using XML. The scheme, named XML3DAUDIO, provides a new format for encoding and describing 3D audio scenes in an object oriented manner. Its creation was motivated by the fact that other 3D audio scene description formats are either too simplistic (VRML) and lacking in re- alism, or are too complex (MPEG-4 Advanced AudioBIFS) and, as a result, have not yet been fully implemented in available decoders and scene authoring tools. This thesis shows that the scene graph model, used by VRML and MPEG-4 AudioBIFS, leads to complex and inefficient 3D audio scene descriptions. This complexity is a result of the aggregation, in the scene graph model, of the scene content data and the scene temporal data. The resulting 3D audio scene descriptions, are in turn, difficult to re-author and significantly increase the complexity of 3D audio scene ren- derers. In contrast, XML3DAUDIO follows a new scene orchestra and score approach which allows the separation of the scene content data from the scene temporal data; this simplifies 3D audio scene descriptions and allows simpler 3D audio scene renderer implementations. In addition, the separation of the temporal and content data permits easier modification and re-authoring of 3D audio scenes. It is shown that XML3DAUDIO can be used as a new format for 3D audio scene rendering or can alternatively be used as a meta-data scheme for annotating 3D audio content. Rendering and perception of the apparent extent of sound sources in 3D audio displays is then considered. Although perceptually important, the extent of sound sources is one the least studied auditory percepts and is often neglected in 3D audio displays. This research aims to improve the realism of rendered 3D audio scenes by reproducing the multidimensional extent exhibited by some natural sound sources (eg a beach front, a swarm of insects, wind blowing in trees etc). Usually, such broad

[1]  Kenji Yokoyama Device for forming a simulated stereophonic sound field , 1991 .

[2]  David G. Malham,et al.  Higher Order Ambisonic Systems for the Spatialisation of Sound , 1999, International Conference on Mathematics and Computing.

[3]  G. J. Thomas,et al.  Volume and loudness of noise. , 1952, The American journal of psychology.

[4]  Markus H. Gross,et al.  Spatialized audio rendering for immersive virtual environments , 2002, VRST '02.

[5]  M. R. Schroeder,et al.  Digital simulation of sound transmission in reverberant spaces (part 1) , 1969 .

[6]  Miguelina Guirao,et al.  Measurement of Auditory Density , 1964 .

[7]  Michael A. Gerzon,et al.  General Metatheory of Auditory Localisation , 1992 .

[8]  Eric D. Scheirer,et al.  SAOL: The MPEG-4 Structured Audio Orchestra Language , 1999, Computer Music Journal.

[9]  Gary S. Kendall,et al.  The Decorrelation of Audio Signals and Its Impact on Spatial Imagery , 1995 .

[10]  B. Bauer,et al.  Phasor analysis of some stereophonic phenomena , 1962 .

[11]  Jean-Marc Jot,et al.  Digital Signal Processing Issues in the Context of Binaural and Transaural Stereophony , 1995 .

[12]  J. C. R. Licklider,et al.  The Influence of Interaural Phase Relations upon the Masking of Speech by White Noise , 1948 .

[13]  Davide Rocchesso,et al.  Recognition of distance cues from a virtual spatialization model , 2002 .

[14]  Marije A. J. Baalman Application of Wave Field Synthesis in the composition of electronic music , 2003, ICMC.

[15]  W. N. Kellogg,et al.  Sonar system of the blind. , 1962, Science.

[16]  Michael Barron Spatial Impression and envelopment in concert halls , 1999 .

[17]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[18]  Thomas A. Funkhouser,et al.  Real-time acoustic modeling for distributed virtual environments , 1999, SIGGRAPH.

[19]  Ming Ouhyoung,et al.  Head motion and latency compensation on localization of 3D sound in virtual reality , 1997, VRST '97.

[20]  Gary S. Kendall The Effects of Multi-Channel Signal Decorrelation in Audio Reproduction , 1994, ICMC.

[21]  ON THE LOCALISATION IN THE SUPERIMPOSED SOUNDFIELD , 2004 .

[22]  Jyri Huopaniemi,et al.  Implementation of a virtual audio reality system , 1996 .

[23]  Tapio Lokki,et al.  Virtual Environment Simulation - Advances in the DIVA project , 1997 .

[24]  M. Vorländer Simulation of the transient and steady‐state sound propagation in rooms using a new combined ray‐tracing/image‐source algorithm , 1989 .

[25]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[26]  Harry D. Castle,et al.  Real-Time Implementation of a General Model for Spatial Processing of Sounds , 1983, ICMC.

[27]  Glenn N. Dickins,et al.  Optimal 3D Speaker Panning , 1999 .

[28]  Elizabeth M. Wenzel,et al.  A software-based system for interactive spatial sound synthesis , 2000 .

[29]  Y. Makita On the Directional Localisation of Sound in the Stereophonic Sound Field , 1962 .

[30]  Durand R. Begault,et al.  3-D Sound for Virtual Reality and Multimedia Cambridge , 1994 .

[31]  Raimund Dachselt,et al.  AN INDEPENDENT DECLARATIVE 3D AUDIO FORMAT ON THE BASIS OF XML , 2003 .

[32]  D R Begault,et al.  Preferred Sound Intensity Increase for Sensation of Half Distance , 1991, Perceptual and motor skills.

[33]  Dylan Menzies W-Panning and O-Format, Tools for Object Spatialization , 2002 .

[34]  Jyri Huopaniemi,et al.  DIVA Virtual Audio Reality System , 1996 .

[35]  Ville Pulkki,et al.  Spatial sound generation and perception by amplitude panning techniques , 2001 .

[36]  S. S. Stevens The Volume and Intensity of Tones , 1934 .

[37]  Guillaume Potard,et al.  Encoding 3D sound scenes and music in XML , 2003, ICMC.

[38]  Pavel Zahorik,et al.  AUDITORY DISPLAY OF SOUND SOURCE DISTANCE , 2002 .

[39]  Barry Arons,et al.  A Review of The Cocktail Party Effect , 1992 .

[40]  Andrew Perkis,et al.  MPEG-21: The 21st century multimedia framework , 2003, IEEE Signal Process. Mag..

[41]  Durand R. Begault,et al.  Binaural auralization and perceptual verdicality , 1992 .

[42]  Ralph Glasgal,et al.  Surround Ambiophonic Recording and Reproduction , 2003 .

[43]  Michelle Y. Kim,et al.  Extensible MPEG-4 textual format (XMT) , 2000, MULTIMEDIA '00.

[44]  Michael A. Gerzon,et al.  Ambisonics. Part two: Studio techniques , 1975 .

[45]  Udo Zoelzer,et al.  DAFX: Digital Audio Effects , 2011 .

[46]  Jerome Daniel,et al.  Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging , 2003 .

[47]  J Blauert,et al.  Auditory spaciousness: some further psychoacoustic analyses. , 1986, The Journal of the Acoustical Society of America.

[48]  D R Perrott,et al.  The expanding-image effect: the concept on tonal volume revisited. , 1980, The Journal of auditory research.

[49]  D. W. Batteau,et al.  The role of the pinna in human localization , 1967, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[50]  John G. Neuhoff,et al.  Perceiving acoustic source orientation in three-dimensional space , 2001 .

[51]  D. de Vries,et al.  Wave field synthesis and analysis using array technology , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[52]  V. Ralph Algazi,et al.  Motion-Tracked Binaural Sound , 2004 .

[53]  David Wessel,et al.  Volumetric modeling of acoustic fields in CNMAT's sound spatialization theatre , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[54]  Peter Lundén,et al.  Snd3D; a 3D sound system for VR and interactive applications , 2000, ICMC.

[55]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[56]  Alexandros Eleftheriadis,et al.  MPEG-4's binary format for scene description , 2000, Signal Process. Image Commun..

[57]  Durand R. Begault,et al.  Perceptual Effects of Synthetic Reverberation on Three-Dimensional Audio Systems , 1992 .

[58]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[59]  Ian S. Burnett,et al.  A 16-Speaker 3D Audio-Visual Display Interface and Control System , 2004, ICAD.

[60]  John M. Chowning,et al.  THE SIMULATION OF MOVING SOUND SOURCES , 1970 .

[61]  James A. Moorer,et al.  About This Reverberation Business , 1978 .

[62]  Alan D. Blumlein,et al.  British Patent Specification 394,325 (Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems) , 1958 .

[63]  Tomlinson Holman,et al.  Surrounded by sound , 1999 .

[64]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[65]  Michael A. Gerzon,et al.  Ambisonic Decoders for HDTV , 1992 .

[66]  D. C. Howell Statistical Methods for Psychology , 1987 .

[67]  Robert Richards,et al.  Document Object Model (DOM) , 2006 .

[68]  Søren H. Nielsen,et al.  Auditory Distance Perception in Different Rooms , 1993 .

[69]  Ian Burnett,et al.  An XML-based 3D Audio Scene Metadata Scheme , 2004 .

[70]  David G. Malham,et al.  3-D Sound Spatialization using Ambisonic Techniques , 1995 .

[71]  Jerome Daniel,et al.  Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format , 2003 .

[72]  R. Plomp,et al.  Effect of phase on the timbre of complex tones. , 1969, The Journal of the Acoustical Society of America.

[73]  K Kurozumi,et al.  The relationship between the cross-correlation coefficient of two-channel acoustic signals and sound image quality. , 1983, The Journal of the Acoustical Society of America.

[74]  Jean-Marc Jot,et al.  Rendering MPEG-4 AABIFS content through a low-level cross-platform 3D audio API , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[75]  Thomas Funkhouser,et al.  A beam tracing method for interactive architectural acoustics. , 2004, The Journal of the Acoustical Society of America.

[76]  Mark Franklin Davis History of Spatial Coding , 2003 .

[77]  Michael A. Gerzon Hierarchical System of Surround Sound Transmission for HDTV , 1992 .

[78]  Stephen Barrass,et al.  Listening to the Mind Listening , 2006 .

[79]  Davide Rocchesso,et al.  ACOUSTIC CUES FOR 3-D SHAPE INFORMATION , 2001 .

[80]  최성주 Graphical Programming Language , 2002 .

[81]  Ian S. Burnett,et al.  Using XML Schemas to Create and Encode Interactive 3-D Audio Scenes for Multimedia and Virtual Reality Applications , 2002, DCW.

[82]  Manfred R. Schroeder,et al.  An Artificial Stereophonic Effect Obtained from a Single Audio Signal , 1958 .

[83]  Jens Blauert,et al.  Acoustic simulation of rooms with boundaries of partially specular reflectivity , 2001 .

[84]  C. Fancourt,et al.  A comparison of decorrelation criteria for the blind source separation of nonstationary signals , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.

[85]  Matti Karjalainen,et al.  Modeling of reflections and air absorption in acoustical spaces a digital filter design approach , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[86]  Alexandros Eleftheriadis,et al.  MPEG-4 Systems: Overview , 2000, Signal Process. Image Commun..

[87]  Perry R. Cook,et al.  N ≫ 2: multi-speaker display systems for virtual reality and spatial audio projection , 1998 .

[88]  Ulrich Reiter,et al.  Implementation of MPEG-4 Audio Nodes in an Interactive Virtual 3D Environment , 2003 .

[89]  Rik Van de Walle,et al.  MPEG-21: goals and achievements , 2001 .

[90]  Jyri Huopaniemi,et al.  Comparison of Sound Spatialization Techniques in MPEG-4 Scene Description , 2000, ICMC.

[91]  Rozenn Nicol,et al.  3D-Sound Reproduction Over an Extensive Listening Area: A Hybrid Method Derived from Holophony and Ambisonic , 1999 .

[92]  David Poeppel,et al.  Human Auditory Cortical Processing of Changes in Interaural Correlation , 2005, The Journal of Neuroscience.

[93]  A. J. Zuckerwar,et al.  Atmospheric absorption of sound: Further developments , 1995 .

[94]  Ian Burnett,et al.  DECORRELATION TECHNIQUES FOR THE RENDERING OF APPARENT SOUND SOURCE WIDTH IN 3D AUDIO DISPLAYS , 2004 .

[95]  Simon Carlile,et al.  Virtual Auditory Space: Generation and Applications , 2013, Neuroscience Intelligence Unit.

[96]  C. Kyriakakis Virtual microphones and virtual loudspeakers for multichannel audio , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).

[97]  C. Harris Absorption of Sound in Air versus Humidity and Temperature , 1966 .

[98]  Louis Dunn Fielder,et al.  AC-3: Flexible Perceptual Coding for Audio Transmission and Storage , 1994 .

[99]  David Griesinger,et al.  Spaciousness and Envelopment in Musical Acoustics , 1996 .

[100]  Christian Mueller-Tomfelde Hybrid Sound Reproduction in Audio-Augmented Reality , 2002 .

[101]  Remy Bruno,et al.  A New Comprehensive Approach of Surround Sound Recording , 2003 .

[102]  Jerry Bauck,et al.  Developments in transaural stereo , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[103]  Heewon Lee,et al.  An efficient algorithm for the image model technique , 1988 .

[104]  Thomas A. Funkhouser,et al.  A beam tracing approach to acoustic modeling for interactive virtual environments , 1998, SIGGRAPH.

[105]  Ville Pulkki Uniform spreading of amplitude panned virtual sources , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[106]  Wm Wil Wagenaars Localization of Sound in a Room with Reflecting Walls , 1989 .

[107]  Michael A. Gerzon,et al.  Panpot Laws for Multispeaker Stereo , 1992 .

[108]  Renato Pellegrini Perception-Based Room Rendering for Auditory Scenes , 2000 .

[109]  Julius O. Smith,et al.  Perceptually similar orthogonal sounds and applications to multichannel acoustic echo canceling , 2002 .

[110]  Robert Orban,et al.  A Rational Technique for Synthesizing Pseudo-Stereo from Monophonic Sources , 1970 .

[111]  Davide Rocchesso,et al.  Circulant and elliptic feedback delay networks for artificial reverberation , 1997, IEEE Trans. Speech Audio Process..

[112]  Davide Rocchesso,et al.  A structural approach to distance rendering in personal auditory displays , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[113]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[114]  Christof Faller Binaural Cue Coding: Rendering of sources mixed into a mono signal , 2003 .

[115]  Julius O. Smith,et al.  Doppler Simulation and the Leslie , 2002 .

[116]  David G. Malham Spherical Harmonic Coding of Sound Objects - the Ambisonic 'O' Format , 2001 .

[117]  Davide Rocchesso The Ball within the Box: A Sound-Processing Metaphor , 1995 .

[118]  Jens Blauert,et al.  Psychoacoustic Investigations on Sound-source Occlusion , 2003 .

[119]  M. J. Norušis,et al.  SPSS 14.0 Advanced Statistical Procedures Companion , 2005 .

[120]  B. Kapralos,et al.  Auditory Perception and Spatial Auditory Systems4 , 2002 .

[121]  D R Perrott,et al.  Judgments of sound volume: effects of signal duration, level, and interaural characteristics on the perceived extensity of broadband noise. , 1982, The Journal of the Acoustical Society of America.

[122]  Matti Karjalainen,et al.  Efficient and Parametric Reverberator for Room Acoustics Modeling , 1997, ICMC.

[123]  Grigori Evreinov Spotty: Imaging sonification based on spot-mapping and tonal volume , 2001 .

[124]  A. Sarti,et al.  SOUND SPATIALIZATION BASED ON FAST BEAM TRACING IN THE DUAL SPACE , 2003 .

[125]  M. Kac Can One Hear the Shape of a Drum , 1966 .

[126]  Ian Burnett,et al.  A STUDY ON SOUND SOURCE APPARENT SHAPE AND WIDENESS , 2003 .

[127]  J. Borish Extension of the image model to arbitrary polyhedra , 1984 .

[128]  Renato S. Pellegrini,et al.  Quality assessment of auditory virtual environments , 2001 .

[129]  A. J. Berkhout,et al.  A Holographic Approach to Acoustic Control , 1988 .

[130]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[131]  Sean Ellis Towards more realistic sound in VRML , 1998, VRML '98.

[132]  Lamberto Tronchin 3D Impulse Response measurements on S.Maria del Fiore Church, Florence, Italy , 2000 .

[133]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[134]  Juan J. Sendra Computational acoustics in architecture , 1999 .

[135]  Anne Sedes,et al.  EGOSOUND, AN EGOCENTRIC, INTERACTIVE AND REAL-TIME APPROACH OF SOUND SPACE. , 2003 .

[136]  Alexandre Topol,et al.  Enhancing sound description in VRML , 2001, ICMC.

[137]  David Zicarelli How I Learned to Love a Program That Does Nothing , 2002, Computer Music Journal.

[138]  Greg Schiemer,et al.  Listening to the Mind Listening: Sonification of the Coherence Matrix and Power Spectrum of EEG Signals , 2004, ICAD.

[139]  M. Bosi Multichannel audio coding and its applications in DAB and DVB , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[140]  Embrechts Broad spectrum diffusion model for room acoustics ray-tracing algorithms , 2000, The Journal of the Acoustical Society of America.

[141]  Anthony Vetro,et al.  MPEG-21 digital item adaptation: enabling universal multimedia access , 2004, IEEE MultiMedia.

[142]  Riitta Väänänen User Interaction and Authoring of 3D Sound Scenes in the Carrouso EU project , 2003 .

[143]  Jens Spille,et al.  Study of Sound Source Shape and Wideness in Virtual and Real Auditory Displays , 2003 .

[144]  Benjamin B. Bauer Some Techniques Toward Better Stereophonic Perspective , 1963 .

[145]  E. C. Cherry,et al.  Mechanism of Binaural Fusion in the Hearing of Speech , 1957 .

[146]  Ian S. Burnett,et al.  Control and Measurement of Apparent Sound Source Width and its Applications to Sonification and Virtual Auditory Displays , 2004, ICAD.

[147]  Tapio Lokki,et al.  Extending SMIL with 3D audio , 2003 .

[148]  Horacio Vaggione,et al.  Composing Musical Spaces By Means of Decorrelation of Audio Signals , 2001 .

[149]  Durand R. Begault,et al.  Challenges to the Successful Implementation of 3-D Sound , 1991 .

[150]  Jean-Marc Jot,et al.  Efficient models for reverberation and distance rendering in computer music and virtual audio reality , 1997, ICMC.

[151]  Masayuki Morimoto,et al.  The relation between spatial impression and the precedence effect , 2002 .

[152]  Hideo Suzuki,et al.  The nature and technology of acoustic space , 1995 .

[153]  Margaret Cahill Using XML for Score Representation , 2001 .

[154]  Jean-Marc Jot,et al.  A Comparative Study of 3-D Audio Encoding and Rendering Techniques , 1999 .

[155]  G. J. Rich,et al.  A preliminary study of tonal volume. , 1916 .

[156]  Michael A. Gerzon,et al.  Ambisonics in Multichannel Broadcasting and Video , 1985 .

[157]  Richard Boulanger The Csound book: perspectives in software synthesis, sound design, signal processing, and programming , 2000 .

[158]  Othmar Schimmel,et al.  Auditory displays , 2001 .

[159]  T. Yin,et al.  Psychophysical and physiological evidence for a precedence effect in the median sagittal plane. , 1997, Journal of neurophysiology.

[160]  Edwin G. Boring,et al.  Auditory Theory with Special Reference to Intensity, Volume, and Localization , 1926 .

[161]  Dana S. Hougland,et al.  Concert and Opera Halls: How They Sound , 1996 .

[162]  Jyri Huopaniemi,et al.  AudioBIFS: Describing Audio Scences with MPEG-4 Multimedia Standard , 1999, IEEE Trans. Multim..

[163]  Marinus M. Boone,et al.  Spatial sound-field reproduction by wave-field synthesis , 1995 .

[164]  Robert Höldrich,et al.  FURTHER INVESTIGATIONS ON 3D SOUND FIELDS USING DISTANCE CODING , 2001 .

[165]  S. Lakatos Recognition of Complex Auditory-Spatial Patterns , 1993, Perception.

[166]  Benjamin B. Bauer Phasor Analysis of the Stereophonic Phenomena , 1961 .

[167]  Georg v. Békésy,et al.  Hearing Theories and Complex Sounds , 1963 .

[168]  Jyri Huopaniemi,et al.  Virtual Acoustics Rendering in MPEG-4 Multimedia Standard , 1999, ICMC.

[169]  Athanasios Mouchtaris,et al.  Inverse Filter Design for Immersive Audio Rendering Over Loudspeakers , 2000, IEEE Trans. Multim..

[170]  A. Gogu,et al.  Coefficients' computation for Jot's reverberation algorithm , 2000, 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries. Proceedings. MeleCon 2000 (Cat. No.00CH37099).

[171]  Russell Mason,et al.  Elicitation and measurement of auditory spatial attributes in reproduced sound , 2002 .

[172]  J Blauert,et al.  Spatial mapping of intracranial auditory events for various degrees of interaural coherence. , 1986, The Journal of the Acoustical Society of America.

[173]  Miller S. Puckette,et al.  Designing Multi-Channel Reverberators , 1982 .

[174]  C.-C. Jay Kuo,et al.  An Inter-Channel Redundancy Removal Approach for High-Quality Multichannel Audio Compression , 2000 .

[175]  H. Bass,et al.  Atmospheric Absorption of Sound: Analytical Expressions , 1972 .

[176]  Guy J. Brown,et al.  Modelling the Auditory Perception of Size, Shape and Material: Applications to the Classification of Transient Sonar Sounds , 2003 .

[177]  R M Ruff,et al.  Auditory spatial pattern perception aided by visual choices , 1976, Psychological research.

[178]  William M. Hartmann,et al.  Localization of sound in rooms. , 1982, The Journal of the Acoustical Society of America.

[179]  Murtaza Ali,et al.  Stereophonic acoustic echo cancellation system using time-varying all-pass filtering for signal decorrelation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[180]  Touradj Ebrahimi,et al.  The MPEG-4 Book , 2002 .