Binaural localization and separation techniques

Abstract Based on binaural signals, i.e. the signals observed at the two ears, a listener can localize and recognize different sound sources and then focus on one of these. For decades, researchers have tried to invent a machine that can do the same under similar conditions. Despite all the efforts, the human auditory system is, by far, superior to any machine that has been devised. The topic of this thesis is computational techniques for the localization and separation of sources in binaural signals. In order to give an overview of different areas of research that have considered the problems of source localization and separation, we start with a review of existing techniques. This provides the background for the techniques that we propose subsequently. Binaural Localization The most important cues for localization of sound sources in binaural signals are the level and time differences between the ears. We propose a technique for the joint evaluation of these cues where noisy level difference estimates are combined with less noisy but ambiguous time difference estimates in order to provide accurate azimuth estimates. The proposed technique enables the localization of sources and the tracking of these in dynamic scenes. Head model Based on a study of the level and time differences as function of azimuth angle for different heads, we propose a generic model that is parametrized by the distance between the ears only. This enables the use of the binaural localization technique mentioned above for a listener whose head related transfer functions have not been measured. Binaural separation For the separation of sources we propose a method based on spatial windowing in the azimuth parameter space. Separation of overlapping partials Finally, we propose a technique for the separation of overlapping partials in mixtures of harmonic instruments. The technique is based on the similarity of temporal envelopes between the different partials of a harmonic note.

[1]  Shoko Araki,et al.  The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech , 2003, IEEE Trans. Speech Audio Process..

[2]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Robert C. Maher,et al.  Evaluation of a method for separating digitized duet signals , 1990 .

[4]  Harald Viste,et al.  Separation of harmonic instruments with overlapping partials in multi-channel mixtures , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[5]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[6]  Shoko Araki,et al.  Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming , 2002, ICASSP.

[7]  T. Anderson,et al.  Binaural and spatial hearing in real and virtual environments , 1997 .

[8]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[9]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Harald Viste,et al.  A method for separation of overlapping partials based on similarity of temporal envelopes in multichannel mixtures , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Tero Tolonen Methods for Separation of Harmonic Sound Sources Using Sinusoidal Modeling , 1999 .

[12]  William L. Martens,et al.  Range-dependence of the HRTF for a spherical head , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[13]  Harald Viste,et al.  On the Use of Spatial Cues to Improve Binaural Source Separation , 2003 .

[14]  Scott Rickard,et al.  ROBUSTNESS OF PARAMETRIC SOURCE DEMIXING IN ECHOIC ENVIRONMENTS , 2001 .

[15]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Teresa H. Y. Meng,et al.  Sinusoidal modeling using frame-based perceptually weighted matching pursuits , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[18]  Florian Keiler,et al.  SURVEY ON EXTRACTION OF SINUSOIDS IN STATIONARY SOUNDS , 2002 .

[19]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[21]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[22]  Richard F. Lyon A computational model of binaural localization and separation , 1983, ICASSP.

[23]  R. Plomp,et al.  Tonal consonance and critical bandwidth. , 1965, The Journal of the Acoustical Society of America.

[24]  Miller Puckette,et al.  Accuracy of frequency estimates using the phase vocoder , 1998, IEEE Trans. Speech Audio Process..

[25]  Robert Höldrich,et al.  An Objective Model of Localisation in Binaural Sound Reproduction Systems , 2002 .

[26]  Gary W. Elko,et al.  Constant beamwidth beamforming , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Ernst Mach,et al.  Sensations of tone. , 1897 .

[28]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[29]  Harold Albert Wilson,et al.  On the Perception of the Direction of Sound , 1908 .

[30]  Birger Kollmeier,et al.  A simple architecture for using multiple cues in sound separation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[32]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[33]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. , 1986, The Journal of the Acoustical Society of America.

[34]  Richard M. Stern,et al.  MODELS OF BINAURAL PERCEPTION , 1999 .

[35]  R. Thouless Experimental Psychology , 1939, Nature.

[36]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[37]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[38]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[39]  Kari Torkkola,et al.  Blind Separation For Audio Signals - Are We There Yet? , 1999 .

[40]  B. Liu,et al.  Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform , 2022 .

[41]  Mellody,et al.  The time-frequency characteristics of violin vibrato: modal distribution analysis and synthesis , 2000, The Journal of the Acoustical Society of America.

[42]  D. Van Compernolle,et al.  Multiple beam broadband beamforming: filter design and real-time implementation , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[43]  Bernard C. Picinbono,et al.  On instantaneous amplitude and phase of signals , 1997, IEEE Trans. Signal Process..

[44]  Takao Kobayashi,et al.  Harmonics tracking and pitch extraction based on instantaneous frequency , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[45]  Harald Viste,et al.  An extension for source separation techniques avoiding beats , 2002 .

[46]  Stephen Travis Pope,et al.  Musical Signal Processing , 1997 .

[47]  Anssi Klapuri,et al.  Separation of harmonic sound sources using sinusoidal modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[48]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[49]  Xavier Serra,et al.  Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[50]  Duane H. Cooper,et al.  On Acoustical Specification of Natural Stereo Imaging , 1980 .

[51]  O. L. Frost,et al.  An algorithm for linearly constrained adaptive array processing , 1972 .

[52]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[53]  Pau Bofill,et al.  Underdetermined blind separation of delayed sound sources in the frequency domain , 2003, Neurocomputing.

[54]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..

[55]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[56]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[57]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[58]  Eivind Kvedalen Signal processing using the Teager Energy Operator and other nonlinear operators , 2003 .

[59]  Youngmoo E. Kim,et al.  Musical instrument identification: A pattern‐recognition approach , 1998 .

[60]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[61]  R. Duda,et al.  Range dependence of the response of a spherical head model , 1998 .

[62]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[63]  W. Gaik,et al.  Combined evaluation of interaural time and intensity differences: psychoacoustic results and computer modeling. , 1993, The Journal of the Acoustical Society of America.

[64]  R. Lambert Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures , 1996 .

[65]  Harald Viste,et al.  Binaural Source Localization , 2004 .

[66]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[67]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[68]  Gianpaolo Evangelista,et al.  Analysis and Synthesis of Pseudo-Periodic-Like Noise by Means of Wavelets with Applications to Digital Audio , 2001, EURASIP J. Adv. Signal Process..

[69]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[70]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[71]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[72]  J. C. Middlebrooks,et al.  Listener weighting of cues for lateral angle: the duplex theory of sound localization revisited. , 2002, The Journal of the Acoustical Society of America.

[73]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.

[74]  Teresa H. Y. Meng,et al.  Extending Spectral Modeling Synthesis with Transient Modeling Synthesis , 2000, Computer Music Journal.

[75]  李幼升,et al.  Ph , 1989 .

[76]  Gianpaolo Evangelista,et al.  Fractal Additive Synthesis via Harmonic-Band Wavelets , 2001, Computer Music Journal.

[77]  Mark J. T. Smith,et al.  Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones , 1992 .

[78]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[79]  Julius O. Smith,et al.  A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications , 1998 .

[80]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[81]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[82]  L. Rayleigh,et al.  XII. On our perception of sound direction , 1907 .

[83]  Matti Karjalainen,et al.  Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[84]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[85]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[86]  Harald Viste,et al.  Sound Source Separation: Preprocessing for Hearing Aids and Structured Audio Coding , 2001 .

[87]  Mingui Sun,et al.  Discrete-time instantaneous frequency and its computation , 1993, IEEE Trans. Signal Process..

[88]  Music recognition using note transition context , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[89]  Kunio Kashino,et al.  A music scene analysis system with the MRF-based information integration scheme , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[90]  Shiro Ikeda,et al.  A METHOD OF ICA IN TIME-FREQUENCY DOMAIN , 2003 .

[91]  Philip H Smith,et al.  Coincidence Detection in the Auditory System 50 Years after Jeffress , 1998, Neuron.

[92]  Anssi Klapuri,et al.  Separation of harmonic sounds using multipitch analysis and iterative parameter estimation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[93]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[94]  Paris Smaragdis,et al.  Information theoretic approaches to source separation , 1997 .

[95]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.