Learning the Direction of a Sound Source Using Head Motions and Spectral Features

In this paper we address the problem of localizing a sound-source by combining binaural or monaural spectral features with head movements. Based on a number of psychophysical and behavioral studies suggesting that the problem of spatial hearing is both listener-dependent and dynamic, we propose to address the problem at hand within the framework of unsupervised learning. More precisely, our method is able to retrieve an intrinsic low-dimensional parameterization from the high-dimensional spectral representation of the acoustic input. We address both binaural and monaural spatial localization with both static and dynamic cues. We show that the recovered low-dimensional representations are homeomorphic to the two-dimensional manifold associated with the motor states of a robotic head with two rotational degrees of freedom. We describe the experimental setup and protocols allowing us to gather acoustic data sets with ground truth for both the emitter-to-listener directions and precise head motions. We validate our method using extensive experiments that consist in classifying acoustic vectors from a test set, based on manifold learning with a different training set. Our method strongly contrasts with current approaches in sound localization because it puts forward the role of learning.

[1]  A. A. Handzel,et al.  Biomimetic sound-source localization , 2002 .

[2]  Jonathan Z. Simon,et al.  A Sensorimotor Approach to Sound Localization , 2008, Neural Computation.

[3]  A. King,et al.  The auditory cortex , 2007, Current Biology.

[4]  Volker Willert,et al.  A Probabilistic Model for Binaural Sound Localization , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  R. Held,et al.  MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR. , 1963, Journal of comparative and physiological psychology.

[6]  I. Pollack,et al.  Effect of head movement on the localization of sounds in the equatorial plane , 1967 .

[7]  Radu Horaud,et al.  Cyclorotation Models for Eyes and Cameras , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  R. Thouless Experimental Psychology , 1939, Nature.

[9]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[10]  R. Butler,et al.  The spatial attributes of stimulus frequency in the median sagittal plane and their role in sound localization. , 1983, American journal of otolaryngology.

[11]  H U Schnitzler,et al.  Acoustic flow perception in cf-bats: properties of the available cues. , 1999, The Journal of the Acoustical Society of America.

[12]  Laurent Kneip,et al.  Binaural model for artificial spatial sound localization based on interaural time delays and movements of the interaural axis. , 2008, The Journal of the Acoustical Society of America.

[13]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[14]  Makoto Otani,et al.  Numerical study on source-distance dependency of head-related transfer functions. , 2009, The Journal of the Acoustical Society of America.

[15]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  R. Butler,et al.  Factors that influence the localization of sound in the vertical plane. , 1968, The Journal of the Acoustical Society of America.

[17]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[18]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[19]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[20]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[21]  J. Hebrank,et al.  Spectral cues used in the localization of sound sources on the median plane. , 1974, The Journal of the Acoustical Society of America.

[22]  A. Noë,et al.  A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[23]  W R Thurlow,et al.  Effect of induced head movements on localization of direction of sounds. , 1967, The Journal of the Acoustical Society of America.

[24]  R A Butler,et al.  The spatial attributes of stimulus frequency and their role in monaural localization of sound in the horizontal plane , 1980, Perception & psychophysics.

[25]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[26]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[27]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[28]  H Peremans,et al.  One tone, two ears, three dimensions: a robotic investigation of pinnae movements used by rhinolophid and hipposiderid bats. , 1998, The Journal of the Acoustical Society of America.

[29]  E. Wenzel The relative contribution of interaural time and magnitude cues to dynamic sound localization , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[30]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[31]  M. Gardner,et al.  Problem of localization in the median plane: effect of pinnae cavity occlusion. , 1973, The Journal of the Acoustical Society of America.

[32]  G. Ehret The auditory cortex , 1997, Journal of Comparative Physiology A.

[33]  Martin Cooke,et al.  Binaural Estimation of Sound Source Distance via the Direct-to-Reverberant Energy Ratio for Static and Moving Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[35]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[36]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[38]  Henri Poincaré,et al.  The Foundations of Science: Science and Hypothesis, The Value of Science, Science and Method , 2012 .