Multimodal Visual Data Registration for Web-Based Visualization in Media Production

Recent developments of video and sensing technology have led to large volumes of digital media data. Current media production relies on videos from the principal camera together with a wide variety of heterogeneous source of supporting data [photos, light detection and ranging point clouds, witness video camera, high dynamic range imaging, and depth imagery]. Registration of visual data acquired from various 2D and 3D sensing modalities is challenging because current matching and registration methods are not appropriate due to differences in structure, format, and noise characteristics for multimodal data. A combined 2D/3D visualization of this registered data allows an integrated overview of the entire data set. For such a visualization, a Web-based context presents several advantages. In this paper, we propose a unified framework for registration and visualization of this type of visual media data. A new feature description and matching method is proposed, adaptively considering local geometry, semiglobal geometry, and color information in the scene for more robust registration. The resulting registered 2D/3D multimodal visual data are too large to be downloaded and viewed directly via the Web browser, while maintaining an acceptable user experience. Thus, we employ hierarchical techniques for compression and restructuring to enable efficient transmission and visualization over the Web, leading to interactive visualization as registered point clouds, 2D images, and videos in the browser, improving on the current state-of-the-art techniques for Web-based visualization of big media data. This is the first unified 3D Web-based visualization of multimodal visual media production data sets. The proposed pipeline is tested on big multimodal data set typical of film and broadcast production, which are made publicly available. The proposed feature description method shows two times higher precision of feature matching and more stable registration performance than existing 3D feature descriptors.

[1]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[2]  Federico Tombari,et al.  A combined texture-shape descriptor for enhanced 3D feature matching , 2011, 2011 18th IEEE International Conference on Image Processing.

[3]  Adrian Hilton,et al.  Evaluation of 3D Feature Descriptors for Multi-modal Data Registration , 2013, 2013 International Conference on 3D Vision.

[4]  Adrian Hilton,et al.  Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration , 2014, 2014 2nd International Conference on 3D Vision.

[5]  Mohammed Bennamoun,et al.  A Comprehensive Performance Evaluation of 3D Local Feature Descriptors , 2015, International Journal of Computer Vision.

[6]  Stefan Decker,et al.  A dual-mode user interface for accessing 3D content on the world wide web , 2012, WWW.

[7]  Changjun Chen,et al.  Registration of vehicle based panoramic image and LiDAR point cloud , 2013, Other Conferences.

[8]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[9]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Josep Blat,et al.  3D graphics on the web: A survey , 2014, Comput. Graph..

[11]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Federico Tombari,et al.  Performance Evaluation of 3D Keypoint Detectors , 2012, International Journal of Computer Vision.

[13]  Adrian Hilton,et al.  The Multiple-Camera 3-D Production Studio , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Alun Evans,et al.  Web-based visualisation of on-set point cloud data , 2014, CVMP.

[16]  David Windridge,et al.  Globally Optimal 2D-3D Registration from Points or Lines without Correspondences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Frank P. Ferrie,et al.  Automatic registration of mobile LiDAR and spherical panoramas , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, ACM Trans. Graph..

[19]  Yipu Zhao,et al.  2D-image to 3D-range registration in urban environments via scene categorization and combination of similarity measurements , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  In-So Kweon,et al.  All-Around Depth from Small Motion with a Spherical Panoramic Camera , 2016, ECCV.

[21]  John W. Fisher,et al.  Automatic registration of LIDAR and optical images of urban scenes , 2009, CVPR.

[22]  H Kim IMPART multi-modal dataset , 2015 .

[23]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[24]  Philipp Slusallek,et al.  XML3D: interactive 3D graphics for the web , 2010, Web3D '10.

[25]  Jean-Yves Guillemaut,et al.  Outdoor Dynamic 3-D Scene Reconstruction , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Adrian Hilton,et al.  Planar urban scene reconstruction from spherical images using facade alignment , 2013, IVMSP 2013.

[28]  Mohammed Bennamoun,et al.  3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Stefan Wagner,et al.  Fast delivery of 3D web content: a case study , 2013, Web3D '13.

[30]  Gang Wang,et al.  Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[31]  Qingming Huang,et al.  Effective Multimodality Fusion Framework for Cross-Media Topic Detection , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Guillaume Lavoué,et al.  Streaming compressed 3D data on the web using JavaScript and WebGL , 2013, Web3D '13.

[33]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[34]  Stefan Decker,et al.  On the design of a Dual-Mode User Interface for accessing 3D content on the World Wide Web , 2013, Int. J. Hum. Comput. Stud..

[35]  Michael Zöllner,et al.  X3DOM: a DOM-based HTML5/X3D integration model , 2009, Web3D '09.

[36]  Adrian Hilton,et al.  Wand-based Multiple Camera Studio Calibration , 2007 .

[37]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[38]  Afzal Godil,et al.  Evaluation of 3D interest point detection techniques via human-generated ground truth , 2012, The Visual Computer.

[39]  Alun Evans,et al.  Real-time visualisation and browsing of a distributed video database , 2009, Advances in Computer Entertainment Technology.

[40]  Lars Petersson,et al.  Cutting Edge: Soft Correspondences in Multimodal Scene Parsing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[42]  Joseph L. Mundy,et al.  An Evaluation of Local Shape Descriptors in Probabilistic Volumetric Scenes , 2012, BMVC.

[43]  Adrian Hilton,et al.  3D Scene Reconstruction from Multiple Spherical Stereo Pairs , 2013, International Journal of Computer Vision.

[44]  Xuan Xie,et al.  Automatic registration of fused lidar/digital imagery (texel images) for three-dimensional image creation , 2014 .