Object-based audio capture : separating acoustically-mixed sounds

This thesis investigates how a digital system can recognize and isolate individual sound sources, or audio objects, from an environment containing several sounds. The main contribution of this work is the application of object-based audio capture to unconstrained real-world environments. Several potential applications for object-based audio capture are outlined, and current blind source separation and deconvolution (BSSD) algorithms that have been applied to acoustically-mixed sounds are reviewed. An explanation of the acoustics issues in object-based audio capture is provided, including an argument for using overdetermined mixtures to yield better source separation. A thorough discussion of the difficulties imposed by a real-world environment is offered, followed by several experiments which compare how different filter configurations and filter lengths, as well as reverberant environments, all have an impact on the performance of object-based audio capture. A real-world implementation of object-based audio capture in a conference room with two people speaking is also discussed. This thesis concludes with future directions for research in object-based audio capture. Thesis Supervisor: V. Michael Bove, Jr. Principal Research Scientist MIT Media Laboratory This work was supported by the Digital Life Consortium at the Media Laboratory. Object-Based Audio Capture 4 Object-Based Audio Capture Object-Based Audio Capture: Separating Acoustically-Mixed Sounds Alexander George Westner

[1]  Barak A. Pearlmutter,et al.  A Context-Sensitive Generalization of ICA , 1996 .

[2]  R. Lambert Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures , 1996 .

[3]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[4]  James L. Flanagan,et al.  DSP implementation of source location using microphone arrays , 1996, Optics & Photonics.

[5]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[6]  V. Michael Bove,et al.  Multimedia Based on Object Models: Some Whys and Hows , 2022 .

[7]  Jont B. Allen,et al.  Invertibility of a room impulse response , 1979 .

[8]  A. J. Bell,et al.  Fast blind separation based on information theory , 1995 .

[9]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[10]  V. Michael Bove,et al.  Reflection of presence: toward more natural and responsive telecollaboration , 1998, Other Conferences.

[11]  Christian Jutten,et al.  Space or time adaptive signal processing by neural network models , 1987 .

[12]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[13]  Daniel V. Rabinkin,et al.  Optimum microphone placement for array sound capture , 1997, Optics & Photonics.

[14]  Michael A. Casey,et al.  Vision-Steered Beam Forming and Transaural Rendering for the Artificial Life Interactive Video Environment (ALIVE) , 1995 .

[15]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[16]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[17]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[18]  Araz Vartan Inguilizian Synchronized structured sound : real-time 3-dimensional audio rendering , 1995 .

[19]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.

[20]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[21]  Alex Pentland,et al.  Wearable Audio Computing: A Survey of Interaction Techniques , 2000 .

[22]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  John Vanderkooy,et al.  Transfer-Function Measurement with Maximum-Length Sequences , 1989 .

[24]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[25]  N. Murata,et al.  An On-line Algorithm for Blind Source Separation on Speech Signals , 1998 .

[26]  Nitin Sawhney Contextual awareness, messaging and communication in nomadic audio environments , 1998 .

[27]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[28]  Shiro Ikeda,et al.  A Method of Blind Separation Based on Temporal Structure of Signals , 1998, ICONIP.

[29]  Jonathan Steuer,et al.  Defining virtual reality: dimensions determining telepresence , 1992 .

[30]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[31]  William Moylan The art of recording: the creative resources of music production and audio , 1992 .

[32]  Paris Smaragdis,et al.  Information theoretic approaches to source separation , 1997 .

[33]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[34]  Andreas Ziehe,et al.  Combining time-delayed decorrelation and ICA: towards solving the cocktail party problem , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[35]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[36]  M.H. Er,et al.  Performance study of time delay estimation in a room environment [microphone arrays] , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).