Method for creating location-specific audio textures

An approach is proposed for creating location-specific audio textures for virtual location-exploration services. The presented approach creates audio textures by processing a small amount of audio recorded at a given location, providing a cost-effective way to produce a versatile audio signal that characterizes the location. The resulting texture is non-repetitive and conserves the location-specific characteristics of the audio scene, without the need of collecting large amount of audio from each location. The method consists of two stages: analysis and synthesis. In the analysis stage, the source audio recording is segmented into homogeneous segments. In the synthesis stage, the audio texture is created by randomly drawing segments from the source audio so that the consecutive segments will have timbral similarity near the segment boundaries. Results obtained in listening experiments show that there is no statistically significant difference in the audio quality or location-specificity of audio when the created audio textures are compared to excerpts of the original recordings. Therefore, the proposed audio textures could be utilized in virtual location-exploration services. Examples of source signals and audio textures created from them are available at http://www.cs.tut.fi/~heittolt/audiotexture.

[1]  M. C. Ringo,et al.  Statistical equivalence testing , 2005 .

[2]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[3]  Alan M. MacEachren,et al.  Visualization in modern cartography , 1994 .

[4]  Giselle Limentani,et al.  Beyond the t-test: statistical equivalence testing. , 2005, Analytical chemistry.

[5]  P. Bahr,et al.  Sampling: Theory and Applications , 2020, Applied and Numerical Harmonic Analysis.

[6]  Jochen Schiewe,et al.  Framework and Potential Implementations of Urban Sound Cartography , 2009 .

[7]  R. Jacobson,et al.  Increasing the dimensionality of a Geographic Information System (GIS) Using Auditory Display , 2007 .

[8]  N. Bolognini,et al.  Enhancement of visual perception by crossmodal visuo-auditory interaction , 2002, Experimental Brain Research.

[9]  D. Schwarz,et al.  Corpus-Based Concatenative Synthesis , 2007, IEEE Signal Processing Magazine.

[10]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Lie Lu,et al.  Audio textures: theory and applications , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  David Laurenson,et al.  Estimating clean speech thresholds for perceptual based speech enhancement , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[13]  Diemo Schwarz,et al.  State of the Art in Sound Texture Synthesis , 2011 .

[14]  T. Virtanen,et al.  Probabilistic Model Based Similarity Measures for Audio Query-by-Example , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  Paul Bertelson,et al.  Cross-modal effects of auditory organization on visual perception. Abstract , 1999 .

[16]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Hiroyuki Kasai,et al.  Super-realistic environmental sound synthesizer for location-based sound search system , 2011, IEEE Transactions on Consumer Electronics.

[18]  Eero P. Simoncelli,et al.  Sound texture synthesis via filter statistics , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[19]  P. Purwins,et al.  Sound Texture Synthesis with Hidden Markov Tree Models in the Wavelet Domain , 2010 .

[20]  A. Bostrom,et al.  Equivalence testing for use in psychosocial and services research: An introduction with examples , 1996 .

[21]  Jason Freeman,et al.  Soundscape Composition and Field Recording as a Platform for Collaborative Creativity , 2011, Organised Sound.

[22]  Donald J. Schuirmann A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability , 1987, Journal of Pharmacokinetics and Biopharmaceutics.

[23]  John Krygier,et al.  Sound and Geographic Visualization , 1994 .

[24]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[25]  Arne Jacobs,et al.  Using Self-similarity Matrices for Structure Mining on News Video , 2006, SETN.

[26]  Charlie Mydlarz,et al.  Application of novel techniques for the investigation of human relationships with soundscapes , 2011 .

[27]  Jordi Janer,et al.  Soundscape Generation for Virtual Environments using Community-Provided Audio Databases , 2010 .

[28]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Dylan Menzies,et al.  Physically Motivated Environmental Sound Synthesis for Virtual Worlds , 2010, EURASIP J. Audio Speech Music. Process..

[30]  Dani Korpi,et al.  On the human ability to discriminate audio ambiances from similar locations of an urban environment , 2013, Personal and Ubiquitous Computing.

[31]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.