论文信息 - Multi-modal Visual Data Registration and Web-based Visualisation

Multi-modal Visual Data Registration and Web-based Visualisation

Recent developments of video and sensing technology can lead to large amounts of digital media data. Current media production rely on both video from the principal camera together with a wide variety of heterogeneous source of supporting data (photos, LiDAR point clouds, witness video camera, HDRI and depth imagery). Registration of visual data acquired from various 2D and 3D sensing modalities is challenging because current matching and registration methods are not appropriate due to differences in formats and noise types of multi-modal data. A combined 2D/3D visualisation of this registered data allows an integrated overview of the entire dataset. For such a visualisation a web-based context presents several advantages. In this paper we propose a unified framework for registration and visualisation of this type of visual media data. A new feature description and matching method is proposed, adaptively considering local geometry, semi-global geometry and colour information in the scene for more robust registration. The resulting registered 2D/3D multi-modal visual data is too large to be downloaded and viewed directly via the web browser while maintaining an acceptable user experience. Thus, we employ hierarchical techniques for compression and restructuring to enable efficient transmission and visualisation over the web, leading to interactive visualisation as registered point clouds, 2D images, and videos in the browser, improving on the current state of the art techniques for web-based visualisation of big media data. This is the first unified 3D web-based visualisation of multi-modal visual media production datasets. The proposed pipeline is tested on big multimodal dataset typical of film and broadcast production which are made publicly available. The proposed feature description method shows two times higher precision of feature matching and more stable registration performance than existing 3D feature descriptors.

[1] Afzal Godil,et al. Evaluation of 3D interest point detection techniques via human-generated ground truth , 2012, The Visual Computer.

[2] Guillaume Lavoué,et al. Streaming compressed 3D data on the web using JavaScript and WebGL , 2013, Web3D '13.

[3] Alexei A. Efros,et al. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[4] Alun Evans,et al. Real-time visualisation and browsing of a distributed video database , 2009, Advances in Computer Entertainment Technology.

[5] Lars Petersson,et al. Cutting Edge: Soft Correspondences in Multimodal Scene Parsing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Nico Blodow,et al. Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7] Federico Tombari,et al. A combined texture-shape descriptor for enhanced 3D feature matching , 2011, 2011 18th IEEE International Conference on Image Processing.

[8] Adrian Hilton,et al. Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration , 2014, 2014 2nd International Conference on 3D Vision.

[9] Jean-Yves Guillemaut,et al. Outdoor Dynamic 3D Scene Reconstruction , 2012 .

[10] Mohammed Bennamoun,et al. A Comprehensive Performance Evaluation of 3D Local Feature Descriptors , 2015, International Journal of Computer Vision.

[11] David Windridge,et al. Globally Optimal 2D-3D Registration from Points or Lines without Correspondences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Adrian Hilton,et al. The Multiple-Camera 3-D Production Studio , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[13] Michael Zöllner,et al. X3DOM: a DOM-based HTML5/X3D integration model , 2009, Web3D '09.

[14] Frank P. Ferrie,et al. Automatic registration of mobile LiDAR and spherical panoramas , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15] Federico Tombari,et al. Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[16] Leonidas J. Guibas,et al. A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[17] Jean Ponce,et al. Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Josep Blat,et al. 3D graphics on the web: A survey , 2014, Comput. Graph..

[19] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[20] Federico Tombari,et al. Performance Evaluation of 3D Keypoint Detectors , 2012, International Journal of Computer Vision.

[21] Stefan Decker,et al. A dual-mode user interface for accessing 3D content on the world wide web , 2012, WWW.

[22] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23] Qingming Huang,et al. Effective Multimodality Fusion Framework for Cross-Media Topic Detection , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[24] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[25] Adrian Hilton,et al. Wand-based Multiple Camera Studio Calibration , 2007 .

[26] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[27] Gang Wang,et al. Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[28] Yipu Zhao,et al. 2D-image to 3D-range registration in urban environments via scene categorization and combination of similarity measurements , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29] Stefan Wagner,et al. Fast delivery of 3D web content: a case study , 2013, Web3D '13.

[30] In-So Kweon,et al. All-Around Depth from Small Motion with a Spherical Panoramic Camera , 2016, ECCV.

[31] John W. Fisher,et al. Automatic registration of LIDAR and optical images of urban scenes , 2009, CVPR.

[32] H Kim. IMPART multi-modal dataset , 2015 .

[33] Steven M. Seitz,et al. Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[34] Joseph L. Mundy,et al. An Evaluation of Local Shape Descriptors in Probabilistic Volumetric Scenes , 2012, BMVC.

[35] Adrian Hilton,et al. 3D Scene Reconstruction from Multiple Spherical Stereo Pairs , 2013, International Journal of Computer Vision.

[36] Xuan Xie,et al. Automatic registration of fused lidar/digital imagery (texel images) for three-dimensional image creation , 2014 .

[37] Luís A. Alexandre. 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[38] Adrian Hilton,et al. Evaluation of 3D Feature Descriptors for Multi-modal Data Registration , 2013, 2013 International Conference on 3D Vision.

[39] Alun Evans,et al. Web-based visualisation of on-set point cloud data , 2014, CVMP.

[40] Jianxiong Xiao,et al. Image-based street-side city modeling , 2009, ACM Trans. Graph..

[41] Stefan Decker,et al. On the design of a Dual-Mode User Interface for accessing 3D content on the World Wide Web , 2013, Int. J. Hum. Comput. Stud..

[42] Changjun Chen,et al. Registration of vehicle based panoramic image and LiDAR point cloud , 2013, Other Conferences.

[43] Philipp Slusallek,et al. XML3D: interactive 3D graphics for the web , 2010, Web3D '10.

[44] Adrian Hilton,et al. Planar urban scene reconstruction from spherical images using facade alignment , 2013, IVMSP 2013.

[45] Mohammed Bennamoun,et al. 3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.