A Manifold Learning Approach for Personalizing HRTFs from Anthropometric Features

We present a new anthropometry-based method to personalize head-related transfer functions (HRTFs) using manifold learning in both azimuth and elevation angles with a single nonlinear regression model. The core element of our approach is a domain-specific nonlinear dimensionality reduction technique, denominated Isomap, over the intraconic component of HRTFs resulting from a spectral decomposition. HRTF intraconic components encode the most important cues for HRTF individualization, leaving out subject-independent cues. First, we modify the graph construction procedure of Isomap to integrate relevant prior knowledge of spatial audio into a single manifold for all subjects by exploiting the existing correlations among HRTFs across individuals, directions, and ears. Then, with the aim of preserving the multifactor nature of HRTFs (i.e. subject, direction and frequency), we train a single artificial neural network to predict low-dimensional HRTFs from anthropometric features. Finally, we reconstruct the HRTF from its estimated low-dimensional version using a neighborhood-based reconstruction approach. Our findings show that introducing prior knowledge in Isomap's manifold is a powerful way to capture the underlying factors of spatial hearing. Our experiments show, with p-values less than 0.05, that our approach outperforms using, either a PCA linear reduction, or the full HTRF, in its intermediate stages.

[1]  Henrik Møller Fundamentals of binaural technology , 1991 .

[2]  Simone Spagnol,et al.  Mixed structural modeling of head-related transfer functions for customized binaural audio delivery , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[3]  Bosun Xie,et al.  The Audibility of Spectral Detail of Head-Related Transfer Functions at High Frequency , 2010 .

[4]  Laura E. Ray,et al.  Individualization of head related transfer functions using principal component analysis , 2015 .

[5]  Cheung-Fat Chan,et al.  HRIR customization using common factor decomposition and joint support vector regression , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[6]  Richard O. Duda,et al.  A structural model for binaural sound synthesis , 1998, IEEE Trans. Speech Audio Process..

[7]  Jeroen Breebaart,et al.  Effect of perceptually irrelevant variance in head-related transfer functions on principal component analysis. , 2013, The Journal of the Acoustical Society of America.

[8]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[9]  方勇,et al.  Modeling personalized head-related impulse response using support vector regression , 2009 .

[10]  Kazuhiro Iida,et al.  Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener's pinnae. , 2014, The Journal of the Acoustical Society of America.

[11]  Edgar A. Torres-Gallegos,et al.  Personalization of head-related transfer functions (HRTF) based on automatic photo-anthropometry and inference from a database , 2015 .

[12]  F. Wightman,et al.  A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. , 1992, The Journal of the Acoustical Society of America.

[13]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vesa Välimäki,et al.  Assisted Listening Using a Headset: Enhancing audio perception in real, augmented, and virtual environments , 2015, IEEE Signal Processing Magazine.

[15]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[16]  Lin Li,et al.  Modeling individual HRTF tensor using high-order partial least squares , 2014, EURASIP J. Adv. Signal Process..

[17]  Bill Kapralos,et al.  Application of dimensionality reduction techniques to HRTFS for interactive virtual environments , 2007, ACE '07.

[18]  Yoonsuck Choe,et al.  Manifold Alpha-Integration , 2010, PRICAI.

[19]  John C. Platt,et al.  HRTF magnitude synthesis via sparse representation of anthropometric features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Siome Goldenstein,et al.  Anthropometric-based customization of head-related transfer functions using Isomap in the horizontal plane , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  H. Takemoto,et al.  Mechanism for generating peaks and notches of head-related transfer functions in the median plane. , 2012, The Journal of the Acoustical Society of America.

[22]  Youn-sik Park,et al.  Modeling and Customization of Head-Related Impulse Responses Based on General Basis Functions in Time Domain , 2008 .

[23]  Ee-Leng Tan,et al.  Individualization of Binaural Synthesis Using Frontal Projection Headphones , 2013 .

[24]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[25]  Bill Kapralos,et al.  Dimensionality reduced HRTFs: a comparative study , 2008, ACE '08.

[26]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  H. Colburn,et al.  On the minimum-phase approximation of head-related transfer functions , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[29]  Philip A. Nelson,et al.  Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric models , 2007 .

[30]  Larry S. Davis,et al.  Rendering localized spatial audio in a virtual auditory space , 2004, IEEE Transactions on Multimedia.

[31]  Xie Zhiqiang,et al.  Head-related transfer function database and its analyses , 2007 .

[32]  Ramani Duraiswami,et al.  The manifolds of spatial hearing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[33]  Kazuya Takeda,et al.  Estimation of HRTFs on the horizontal plane using physical features , 2007 .

[34]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[35]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[36]  Yukio Iwaya,et al.  Estimation of interaural level difference based on anthropometry and its effect on sound localization. , 2007, The Journal of the Acoustical Society of America.

[37]  Makoto Otani,et al.  Fast calculation system specialized for head-related transfer function based on boundary element method. , 2006, The Journal of the Acoustical Society of America.

[38]  David Schönstein,et al.  HRTF selection for binaural synthesis from a database using morphological parameters , 2010 .

[39]  Gavriel Salvendy,et al.  Improved method to individualize head-related transfer function using anthropometric measurements , 2008 .

[40]  M. Alex O. Vasilescu,et al.  A Multilinear (Tensor) Framework for HRTF Analysis and Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[41]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[42]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[43]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[44]  V. Ralph Algazi,et al.  Estimation of a Spherical-Head Model from Anthropometry , 2001 .

[45]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[46]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[47]  R. Duda,et al.  Approximating the head-related transfer function using simple geometric models of the head and torso. , 2002, The Journal of the Acoustical Society of America.

[48]  Durand R. Begault,et al.  3-D Sound for Virtual Reality and Multimedia Cambridge , 1994 .

[49]  J. C. Middlebrooks,et al.  Individual differences in external-ear transfer functions reduced by scaling in frequency. , 1999, The Journal of the Acoustical Society of America.

[50]  Qian Huang,et al.  HRIR personalisation using support vector regression in independent feature space , 2009 .

[51]  Ee-Leng Tan,et al.  Natural Sound Rendering for Headphones: Integration of signal processing techniques , 2015, IEEE Signal Processing Magazine.

[52]  Bosun Xie,et al.  Recovery of individual head-related transfer functions from a small set of measurements. , 2012, The Journal of the Acoustical Society of America.

[53]  R. A. Kennedy,et al.  Statistical method to identify key anthropometric parameters in hrtf individualization , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[54]  Jean-Paul Watson,et al.  Algorithmic dimensionality reduction for molecular structure analysis. , 2008, The Journal of chemical physics.

[55]  S. Carlile,et al.  Enabling Individualized Virtual Auditory Space using Morphological Measurments , 2000 .

[56]  Brian D. Simpson,et al.  Do you hear where I hear?: isolating the individualized sound localization cues , 2014, Front. Neurosci..

[57]  Lin Li,et al.  HRTF personalization modeling based on RBF neural network , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[58]  L. Saul,et al.  Think globally, fit locally: unsupervised l earning of non-linear manifolds , 2002 .

[59]  Zhenyang Wu,et al.  HRTF personalization based on artificial neural network in individual virtual auditory space , 2008 .