Big Multimodal Visual Data Registration for Digital Media Production

Modern digital media production relies on various heterogeneous source of supporting data (snapshots, LiDAR, HDR and depth images) as well as videos from cameras. Recent developments of camera and sensing technology have led to huge amounts of digital media data. The management and process of this heterogeneous data consumes enormous resources. In this chapter, we present a multimodal visual data registration framework. A new feature description and matching method for multimodal data is introduced, considering local/semi-global geometry and colour information in the scene for more robust registration. Combined 2D/3D visualisation of this registered data allows an integrated overview of the entire dataset. The proposed framework is tested on multimodal dataset of film and broadcast production which are made publicly available. The resulting automated registration of multimodal datasets supports more efficient creative decision making in media production enabling data visualisation, search and verification across a wide variety of assets.

[1]  Afzal Godil,et al.  Evaluation of 3D interest point detection techniques via human-generated ground truth , 2012, The Visual Computer.

[2]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[3]  Yipu Zhao,et al.  2D-image to 3D-range registration in urban environments via scene categorization and combination of similarity measurements , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  R. Dinesh,et al.  Non-parametric adaptive region of support useful for corner detection: a novel approach , 2004, Pattern Recognit..

[6]  Adrian Hilton,et al.  Planar urban scene reconstruction from spherical images using facade alignment , 2013, IVMSP 2013.

[7]  Mohammed Bennamoun,et al.  3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gang Wang,et al.  Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[9]  Lars Petersson,et al.  Cutting Edge: Soft Correspondences in Multimodal Scene Parsing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Adrian Hilton,et al.  Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration , 2014, 2014 2nd International Conference on 3D Vision.

[11]  Mohammed Bennamoun,et al.  A Comprehensive Performance Evaluation of 3D Local Feature Descriptors , 2015, International Journal of Computer Vision.

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Olaf Kähler,et al.  Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[14]  Jean-Yves Guillemaut,et al.  Outdoor Dynamic 3-D Scene Reconstruction , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[16]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[17]  Federico Tombari,et al.  A combined texture-shape descriptor for enhanced 3D feature matching , 2011, 2011 18th IEEE International Conference on Image Processing.

[18]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[19]  John W. Fisher,et al.  Automatic registration of LIDAR and optical images of urban scenes , 2009, CVPR.

[20]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[21]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Federico Tombari,et al.  Performance Evaluation of 3D Keypoint Detectors , 2012, International Journal of Computer Vision.

[24]  Torsten Sattler,et al.  SCRAMSAC: Improving RANSAC's efficiency with a spatial consistency filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Changjun Chen,et al.  Registration of vehicle based panoramic image and LiDAR point cloud , 2013, Other Conferences.

[26]  Adrian Hilton,et al.  The Multiple-Camera 3-D Production Studio , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[28]  Philip H. S. Torr,et al.  Probabilistic Object Reconstruction with Online Global Model Correction , 2017, 2017 International Conference on 3D Vision (3DV).

[29]  Frank P. Ferrie,et al.  Automatic registration of mobile LiDAR and spherical panoramas , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[31]  Adrian Hilton,et al.  Evaluation of 3D Feature Descriptors for Multi-modal Data Registration , 2013, 2013 International Conference on 3D Vision.

[32]  In-So Kweon,et al.  All-Around Depth from Small Motion with a Spherical Panoramic Camera , 2016, ECCV.

[33]  Luigi Barazzetti,et al.  3D MODELLING WITH THE SAMSUNG GEAR 360 , 2017 .

[34]  David Windridge,et al.  Globally Optimal 2D-3D Registration from Points or Lines without Correspondences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Joseph L. Mundy,et al.  An Evaluation of Local Shape Descriptors in Probabilistic Volumetric Scenes , 2012, BMVC.

[36]  Xuan Xie,et al.  Automatic registration of fused lidar/digital imagery (texel images) for three-dimensional image creation , 2014 .

[37]  Vincent Lepetit,et al.  Appearance-based keypoint clustering , 2009, CVPR.

[38]  Andreas Geiger,et al.  Omnidirectional 3D reconstruction in augmented Manhattan worlds , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[40]  Ioannis Stamos,et al.  Integrating Automated Range Registration with Multiview Geometry for the Photorealistic Modeling of Large-Scale Scenes , 2008, International Journal of Computer Vision.

[41]  Adrian Hilton,et al.  Wand-based Multiple Camera Studio Calibration , 2007 .

[42]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[43]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[44]  Jean-Yves Guillemaut,et al.  Outdoor Dynamic 3D Scene Reconstruction , 2012 .

[45]  Luís A. Alexandre,et al.  A comparative evaluation of 3D keypoint detectors in a RGB-D Object Dataset , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[46]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, SIGGRAPH 2009.