Scene-aware audio for 360° videos

Although 360° cameras ease the capture of panoramic footage, it remains challenging to add realistic 360° audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360° videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a room's impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.

[1]  Franz Zotter,et al.  An Alternative Ambisonics Formulation: Modal Source Strength Matching and the Effect of Spatial Aliasing , 2009 .

[2]  Dinesh Manocha,et al.  Acoustic Classification and Optimization for Multi-Modal Rendering of Real-World Scenes , 2018, IEEE Transactions on Visualization and Computer Graphics.

[3]  Nicolas Tsingos,et al.  Topological Sound Propagation with Reverberation Graphs , 2008 .

[4]  Alan Chalmers,et al.  Realtime Room Acoustics Using Ambisonics , 1999 .

[5]  Richard Szeliski,et al.  Low-cost 360 stereo photography and video capture , 2017, ACM Trans. Graph..

[6]  Stefan Bilbao,et al.  Numerical Sound Synthesis , 2009 .

[7]  Ravish Mehra,et al.  Efficient construction of the spatial room impulse response , 2017 .

[8]  Nikunj Raghuvanshi,et al.  Parametric wave field coding for precomputed sound propagation , 2014, ACM Trans. Graph..

[9]  Josh H McDermott,et al.  Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.

[10]  Sato Imari,et al.  Inverse Rendering for Computer Graphics , 2010 .

[11]  Dinesh Manocha,et al.  High-order diffraction and diffuse reflections for interactive sound propagation in large environments , 2014, ACM Trans. Graph..

[12]  Jonathan T. Barron,et al.  Jump: virtual reality video , 2016, ACM Trans. Graph..

[13]  Johannes Kopf,et al.  360° video stabilization , 2016, ACM Trans. Graph..

[14]  S. Marschner,et al.  Inverse Rendering for Computer Graphics , 1998 .

[15]  Stephen DiVerdi,et al.  VoCo , 2017, ACM Trans. Graph..

[16]  Thomas A. Funkhouser,et al.  Modeling acoustics in virtual environments using the uniform theory of diffraction , 2001, SIGGRAPH.

[17]  Michael Vorlnder,et al.  Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality , 2020 .

[18]  Ming C. Lin,et al.  Precomputed wave simulation for real-time sound propagation of dynamic sources in complex scenes , 2010, ACM Trans. Graph..

[19]  Gautham J. Mysore,et al.  Equalization matching of speech recordings in real-world environments , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Zhili Chen,et al.  6-DOF VR videos with a single 360-camera , 2017, 2017 IEEE Virtual Reality (VR).

[21]  R.N. Bracewell,et al.  Signal analysis , 1978, Proceedings of the IEEE.

[22]  Lauri Savioja,et al.  Overview of geometrical room acoustic modeling techniques. , 2015, The Journal of the Acoustical Society of America.

[23]  Ming C. Lin,et al.  Example-guided physically based modal sound synthesis , 2013, ACM Trans. Graph..

[24]  Trevor J. Cox,et al.  Room sizing and optimization at low frequencies , 2004 .

[25]  Nicolas Tsingos,et al.  Precomputing Geometry-Based Reverberation Effects for Games , 2009 .

[26]  Angelo Farina,et al.  Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique , 2000 .

[27]  Thomas A. Funkhouser,et al.  A beam tracing approach to acoustic modeling for interactive virtual environments , 1998, SIGGRAPH.

[28]  Dinesh Manocha,et al.  Interactive sound propagation with bidirectional path tracing , 2016, ACM Trans. Graph..

[29]  Ming C. Lin,et al.  Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition , 2009, IEEE Transactions on Visualization and Computer Graphics.

[30]  M. Hodgson Evidence of diffuse surface reflections in rooms , 1990 .

[31]  Gary S. Kendall,et al.  The Decorrelation of Audio Signals and Its Impact on Spatial Imagery , 1995 .

[32]  Wilmot Li,et al.  Content-based tools for editing audio stories , 2013, UIST.

[33]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[34]  Ravish Mehra,et al.  Efficient HRTF-based Spatial Audio for Area and Volumetric Sources , 2016, IEEE Transactions on Visualization and Computer Graphics.

[35]  D. M. Campbell,et al.  Springer Handbook of Acoustics , 2015 .

[36]  M B Gardner,et al.  Historical background of the Haas and-or precedence effect. , 1968, The Journal of the Acoustical Society of America.

[37]  Younghui Kim,et al.  Rich360 , 2016, ACM Trans. Graph..