Synthetic individual binaural audio delivery by pinna image processing

Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a virtual sound source better than non-individual HRTFs. Research limitations/implications – Results encourage a deeper analysis of the psychoacoustic impact that the individualized HRTF model has on perceived elevation of virtual sound sources. Practical implications – The model has low complexity and is suitable for implem...

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Simone Spagnol,et al.  Fitting pinna-related transfer functions to anthropometry for binaural sound rendering , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[3]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[4]  Davide Rocchesso,et al.  Automatic extraction of pinna edges for binaural audio customization , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[5]  Stephen A. Brewster,et al.  Spatial audio in small screen device displays , 2000, Personal Technologies.

[6]  Xavier Serra,et al.  Digital Audio Effects , 2011 .

[7]  Ana Alves-Pinto,et al.  Detection of high-frequency spectral notches as a function of level. , 2005, The Journal of the Acoustical Society of America.

[8]  W R Thurlow,et al.  Effect of induced head movements on localization of direction of sounds. , 1967, The Journal of the Acoustical Society of America.

[9]  Richard O. Duda,et al.  A structural model for binaural sound synthesis , 1998, IEEE Trans. Speech Audio Process..

[10]  Udo Zölzer,et al.  Filters and Delays , 2011 .

[11]  Simone Spagnol,et al.  On the Relation Between Pinna Reflection Patterns and Head-Related Transfer Function Features , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  V. Ralph Algazi,et al.  Estimation of a Spherical-Head Model from Anthropometry , 2001 .

[13]  Gregory H. Wakefield,et al.  Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space , 2001 .

[14]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[15]  D. M. Green,et al.  A maximum-likelihood method for estimating thresholds in a yes-no task. , 1993, The Journal of the Acoustical Society of America.

[16]  Simone Spagnol,et al.  Enhancing vertical localization with image-guided selection of non-individual head-related transfer functions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  V. Ralph Algazi,et al.  Physical and Filter Pinna Models Based on Anthropometry , 2007 .

[18]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[19]  Dorte Hammershøi,et al.  Binaural Technique: Do We Need Individual Recordings? , 1996 .

[20]  J. Brugge,et al.  Sensitivity of auditory nerve fibers to spectral notches. , 1993, Journal of neurophysiology.

[21]  Alexander Lindau,et al.  Perceptual Evaluation of Headphone Compensation in Binaural Synthesis Based on Non-Individual Recordings , 2012 .

[22]  Simon R. Oldfield,et al.  Detection and discrimination of spectral peaks and notches at 1 and 8 kHz. , 1989, The Journal of the Acoustical Society of America.

[23]  Ville Pulkki,et al.  A single-azimuth pinna-related transfer function database , 2011 .

[24]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[25]  Simone Spagnol,et al.  Structural modeling of pinna-related transfer functions , 2010 .

[26]  R. M. Sachs,et al.  Anthropometric manikin for acoustic research. , 1975, The Journal of the Acoustical Society of America.

[27]  Ramesh Raskar,et al.  Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging , 2004, SIGGRAPH 2004.

[28]  Mark R. Anderson,et al.  Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source. , 2001, Journal of the Audio Engineering Society. Audio Engineering Society.

[29]  Ramesh Raskar,et al.  Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging , 2004 .

[30]  Simone Spagnol,et al.  A Modular Framework for the Analysis and Synthesis of Head-Related Transfer Functions , 2013 .

[31]  R Meddis,et al.  A physical model of sound diffraction and reflections in the human concha. , 1996, The Journal of the Acoustical Society of America.

[32]  Simone Spagnol,et al.  Mixed structural modeling of head-related transfer functions for customized binaural audio delivery , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[33]  J. Hebrank,et al.  Spectral cues used in the localization of sound sources on the median plane. , 1974, The Journal of the Acoustical Society of America.

[34]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[35]  Simone Spagnol,et al.  A Head-Related Transfer Function Model for Real-Time Customized 3-D Sound Rendering , 2011, 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems.