3D Room Geometry Reconstruction Using Audio-Visual Sensors

In this paper we propose a cuboid-based air-tight indoor room geometry estimation method using combination of audio-visual sensors. Existing vision-based 3D reconstruction methods are not applicable for scenes with transparent or reflective objects such as windows and mirrors. In this work we fuse multi-modal sensory information to overcome the limitations of purely visual reconstruction for reconstruction of complex scenes including transparent and mirror surfaces. A full scene is captured by 360$^{\circ}$ cameras and acoustic room impulse responses (RIRs) recorded by a loudspeaker and compact microphone array. Depth information of the scene is recovered by stereo matching from the captured images and estimation of major acoustic reflector locations from the sound. The coordinate systems for audio-visual sensors are aligned into a unified reference frame and plane elements are reconstructed from audio-visual data. Finally cuboid proxies are fitted to the planes to generate a complete room model. Experimental results show that the proposed system generates complete representations of the room structures regardless of transparent windows, featureless walls and shiny surfaces.

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  Paul Debevec,et al.  Modeling and Rendering Architecture from Photographs , 1996, SIGGRAPH 1996.

[3]  Angelo Farina,et al.  Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique , 2000 .

[4]  D. Vries,et al.  Acoustic imaging in enclosed spaces: Analysis of room geometry modifications on the impulse response , 2004 .

[5]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[6]  Andrea Fusiello,et al.  Augmented scene modeling and visualization by optical and acoustic sensor integration , 2004, IEEE Transactions on Visualization and Computer Graphics.

[7]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[8]  Kwanghoon Sohn,et al.  3D reconstruction from stereo images for interactions between real and virtual objects , 2005, Signal Process. Image Commun..

[9]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Marc Pollefeys,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, SIGGRAPH 2008.

[11]  Barry R. Masters,et al.  Digital Image Processing, Third Edition , 2009 .

[12]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Martin Vetterli,et al.  Can one hear the shape of a room: The 2-D polygonal case , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  H. Mayer,et al.  FINDING CUBOID-BASED BUILDING MODELS IN POINT CLOUDS , 2012 .

[15]  Marc Pollefeys,et al.  A Patch Prior for Dense 3D Reconstruction in Man-Made Environments , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[16]  Emanuel A. P. Habets,et al.  Inference of Room Geometry From Acoustic Impulse Responses , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Stefan Weinzierl,et al.  Perceptual Evaluation of Model- and Signal-Based Predictors of the Mixing Time in Binaural Room Impulse Responses * , 2012 .

[18]  Sakari Tervo,et al.  3D room geometry estimation from measured impulse responses , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Vesa Välimäki,et al.  Fifty Years of Artificial Reverberation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Sakari Tervo,et al.  Acoustic reflection localization from room impulse responses , 2012 .

[21]  Martin Vetterli,et al.  Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[22]  M. Bai,et al.  Application of convex optimization to acoustical array signal processing , 2013 .

[23]  Jianxiong Xiao,et al.  Reconstructing the World’s Museums , 2012, International Journal of Computer Vision.

[24]  Javier Civera,et al.  Grounding Acoustic Echoes in Single View Geometry Estimation , 2014, AAAI.

[25]  Andreas Geiger,et al.  Omnidirectional 3D reconstruction in augmented Manhattan worlds , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Dinesh Manocha,et al.  3D Reconstruction in the presence of glasses by acoustic and stereo fusion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Shi-Min Hu,et al.  3D indoor scene modeling from RGB-D data: a survey , 2015, Computational Visual Media.

[29]  Philip J. B. Jackson,et al.  Visualization of Compact Microphone Array Room Impulse Responses , 2015 .

[30]  Adrian Hilton,et al.  Block world reconstruction from spherical stereo image pairs , 2015, Comput. Vis. Image Underst..

[31]  Adrian Hilton,et al.  Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[32]  D. Eberly Least Squares Fitting of Data , 2016 .

[33]  Fabio Bruno,et al.  An Alignment Method for the Integration of Underwater 3D Data Captured by a Stereovision System and an Acoustic Camera , 2016, Sensors.

[34]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[35]  Philip J. B. Jackson,et al.  Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Luigi Barazzetti,et al.  3D MODELLING WITH THE SAMSUNG GEAR 360 , 2017 .

[37]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).