Ray Space Transform Interpolation with Convolutional Autoencoder

In this paper we propose an algorithm for the reconstruction of the Ray Space Transform (RST) through the use of neural networks. In particular, our aim is to reconstruct the magnitude of the RST acquired from a linear microphone array, as if the array were composed by a larger amount of microphones. This is useful for applications that need a higher RST resolution when only a limited amount of microphones can be used due to practical constraints or physical limitations. The proposed solution leverages recent advancements in deep learning as it is based on a fully convolutional autoencoder. To validate our method, we show through a simulative campaign that it is possible to improve sound source localization using the reconstructed RST compared to the use of the original RST.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Augusto Sarti,et al.  Fast Tracing of Acoustic Beams and Paths Through Visibility Lookup , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Augusto Sarti,et al.  The Ray Space Transform: A New Framework for Wave Field Processing , 2016, IEEE Transactions on Signal Processing.

[5]  Francesco Piazza,et al.  Localizing speakers in multiple rooms by using Deep Neural Networks , 2018, Comput. Speech Lang..

[6]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[7]  Augusto Sarti,et al.  Soundfield Imaging in the Ray Space , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Augusto Sarti,et al.  Extraction of Acoustic Sources Through the Processing of Sound Field Maps in the Ray Space , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Emanuel A. P. Habets,et al.  Broadband doa estimation using convolutional neural networks trained with noise signals , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  A. Moiseff,et al.  An artificial neural network for sound localization using binaural cues. , 1996, The Journal of the Acoustical Society of America.

[12]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[13]  Sharon Gannot,et al.  Semi-Supervised Source Localization on Multiple Manifolds With Distributed Microphones , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.