An Audio-Visual Method for Room Boundary Estimation and Material Recognition

In applications such as virtual and augmented reality, a plausible and coherent audio-visual reproduction can be achieved by deeply understanding the reference scene acoustics. This requires knowledge of the scene geometry and related materials. In this paper, we present an audio-visual approach for acoustic scene understanding. We propose a novel material recognition algorithm, that exploits information carried by acoustic signals. The acoustic absorption coefficients are selected as features. The training dataset was constructed by combining information available in the literature, and additional labeled data that we recorded in a small room having short reverberation time (RT60). Classic machine learning methods are used to validate the model, by employing data recorded in five rooms, having different sizes and RT60s. The estimated materials are utilized to label room boundaries, reconstructed by a vision-based method. Results show 89% and 80% agreement between the estimated and reference room volumes and materials, respectively.

[1]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[3]  D. Stork,et al.  Speechreading by Man and Machine: Models, Systems, and Applications , 1996 .

[4]  平山亮 会議報告-Speechreading by Humans and Machines; Models Systems and Applications , 1997 .

[5]  S. Bech,et al.  Spatial aspects of reproduced sound in small rooms. , 1998, The Journal of the Acoustical Society of America.

[6]  C. Nocke In-situ acoustic impedance measurement using a free-field transfer function method , 2000 .

[7]  Angelo Farina,et al.  Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique , 2000 .

[8]  Francis Rumsey,et al.  Spatial quality evaluation for reproduced sound: terminology, meaning and a scene-based paradigm , 2002 .

[9]  Soon-Wook Kwon,et al.  Fitting range data to primitives for rapid local 3D modeling using sparse range point clouds , 2004 .

[10]  Andrea Fusiello,et al.  Augmented scene modeling and visualization by optical and acoustic sensor integration , 2004, IEEE Transactions on Visualization and Computer Graphics.

[11]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[12]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[14]  Tj Cox,et al.  Acoustic absorbers and diffusers : theory, Design and Application [third edition] , 2010 .

[15]  Edward H. Adelson,et al.  Exploring features in a Bayesian framework for material recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  George Drettakis,et al.  Bimodal perception of audio-visual material properties for virtual environments , 2010, TAP.

[17]  Vandana,et al.  Survey of Nearest Neighbor Techniques , 2010, ArXiv.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Xiaofeng Ren,et al.  Toward Robust Material Recognition for Everyday Objects , 2011, BMVC.

[20]  Kaspar Althoefer,et al.  Surface material recognition through haptic exploration using an intelligent contact sensing finger , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Ole Madsen,et al.  Autonomous industrial mobile manipulation (AIMM): past, present and future , 2012, Ind. Robot.

[22]  Nicolas Tsingos,et al.  Acoustic Rendering and Auditory–Visual Cross‐Modal Perception and Interaction , 2012, Comput. Graph. Forum.

[23]  Jianping Gou,et al.  A new distance-weighted k-nearest neighbor classifier , 2012 .

[24]  Ming C. Lin,et al.  Auditory Perception of Geometry-Invariant Material Properties , 2013, IEEE Transactions on Visualization and Computer Graphics.

[25]  Trevor J. Cox and Peter D'Antonio Acoustic absorbers and diffusers , 2013 .

[26]  Martin Vetterli,et al.  Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[27]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[28]  Matthew Turk,et al.  Multimodal interaction: A review , 2014, Pattern Recognit. Lett..

[29]  Javier Civera,et al.  Grounding Acoustic Echoes in Single View Geometry Estimation , 2014, AAAI.

[30]  Marshall Long,et al.  6 – Wave Acoustics , 2014 .

[31]  Augusto Sarti,et al.  Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Mark Billinghurst,et al.  A Survey of Augmented Reality , 2015, Found. Trends Hum. Comput. Interact..

[35]  Dinesh Manocha,et al.  3D Reconstruction in the presence of glasses by acoustic and stereo fusion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shi-Min Hu,et al.  3D indoor scene modeling from RGB-D data: a survey , 2015, Computational Visual Media.

[37]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Adrian Hilton,et al.  Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[40]  Judy M. Vance,et al.  Industry use of virtual reality in product design and manufacturing: a survey , 2017, Virtual Reality.

[41]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[42]  Adrian Hilton,et al.  Acoustic Room Modelling using a Spherical Camera for Reverberant Spatial Audio Objects , 2017 .

[43]  Edgar Lopez-Caudana,et al.  Classification of materials by acoustic signal processing in real time for NAO robots , 2017 .

[44]  Hideo Saito,et al.  A survey of diminished reality: Techniques for visually concealing, eliminating, and seeing through real objects , 2017, IPSJ Transactions on Computer Vision and Applications.

[45]  Philip J. B. Jackson,et al.  Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[46]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Philip J. B. Jackson,et al.  Object-based reverberation encoding from first-order Ambisonic RIRs , 2017 .

[49]  Adrian Hilton,et al.  3D Room Geometry Reconstruction Using Audio-Visual Sensors , 2017, 2017 International Conference on 3D Vision (3DV).

[50]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Emanuël A. P. Habets,et al.  3D Room Geometry Inference Based on Room Impulse Response Stacks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[52]  Adrian Hilton,et al.  Acoustic Reflector Localization and Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Michael Vorlnder,et al.  Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality , 2020 .

[54]  Luca Remaggi,et al.  S3A Room Impulse Responses , 2020 .