论文信息 - An Automatic Three-Dimensional Scene Reconstruction System Using Crowdsourced Geo-Tagged Videos

An Automatic Three-Dimensional Scene Reconstruction System Using Crowdsourced Geo-Tagged Videos

Automatic 3-D scene reconstruction is a useful technique in modern intelligent systems. Scene reconstruction from video sequences requires a selection of representative video frames. Most previous works employ the content-based techniques to automatically extract key frames. These methods take no frame geo-information into account and may be computationally heavy. In this paper, we propose a new key frame selection scheme based on the video geographic cues. Currently, an increasing number of user-generated videos are collected, which is a trend driven by the popularity of smartphones. In addition, it is convenient to acquire and fuse various sensor data (e.g., the geo-spatial metadata) for creating the geo-tagged mobile videos. Nowadays, large repositories of media content are automatically geo-tagged. Our proposed technique utilizes these underlying geo-metadata to select the most representative frames. We first eliminate irrelevant frames in which the candidate 3-D object does not appear. Then, a fixed number of key frames are selected. The criterion is that the selected key frames can maximally cover the candidate 3-D object/scene. Comprehensive experiments demonstrate the high quality of the reconstructed 3-D objects. Moreover, the execution time is reduced by 90%.

Meng Wang | Luming Zhang | Richang Hong | Weisheng Li | Maofu Liu

[1] Bing-Fei Wu,et al. A Real-Time Vision System for Nighttime Vehicle Detection and Traffic Surveillance , 2011, IEEE Transactions on Industrial Electronics.

[2] David Nister,et al. Automatic Dense Reconstruction from Uncalibrated Video Sequences , 2001 .

[3] Xuelong Li,et al. Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation , 2014, IEEE Transactions on Image Processing.

[4] P. Torr. Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[5] Ying Zhang,et al. Learning a Probabilistic Topology Discovering Model for Scene Categorization , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6] Jiebo Luo,et al. Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[7] Jianmin Wang,et al. A Novel Inverter for Arc Welding Machines , 2015, IEEE Transactions on Industrial Electronics.

[8] Cheng-Yuan Chang,et al. Real-Time Visual Tracking and Measurement to Control Fast Dynamics of Overhead Cranes , 2012, IEEE Transactions on Industrial Electronics.

[9] Xiaoli Li,et al. The Specular Exponent as a Criterion for Appearance Quality Assessment of Pearllike Objects by Artificial Vision , 2012, IEEE Transactions on Industrial Electronics.

[10] Michael Goesele,et al. Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11] Chuanhou Gao,et al. Modeling of the Thermal State Change of Blast Furnace Hearth With Support Vector Machines , 2012, IEEE Transactions on Industrial Electronics.

[12] Richard Szeliski,et al. Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Roger Zimmermann,et al. Motch: an automatic motion type characterization system for sensor-rich videos , 2012, ACM Multimedia.

[14] Gene H. Golub,et al. Matrix computations , 1983 .

[15] Xiao Liu,et al. Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[16] Seth Hutchinson,et al. Image fusion and subpixel parameter estimation for automated optical inspection of electronic components , 1996, IEEE Trans. Ind. Electron..

[17] Xuelong Li,et al. Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.

[18] Roger Zimmermann,et al. Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[19] Luming Zhang,et al. An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[20] Li Ling,et al. A Dense 3D Reconstruction Approach from Uncalibrated Video Sequences , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[21] Sang-Hoon Kim,et al. 3D Estimation and Key-Frame Selection for Match Move , 2003 .

[22] Yi Yang,et al. Weakly Supervised Photo Cropping , 2014, IEEE Transactions on Multimedia.

[23] Jong-Soo Choi,et al. Optimal keyframe selection algorithm for three-dimensional reconstruction in uncalibrated multiple images , 2008 .

[24] Gérard G. Medioni,et al. Aerial 3D reconstruction with line-constrained dynamic programming , 2011, 2011 International Conference on Computer Vision.

[25] Athman Bouguettaya,et al. On-Line Clustering , 1996, IEEE Trans. Knowl. Data Eng..

[26] Tong Heng Lee,et al. A Robust Real-Time Embedded Vision System on an Unmanned Rotorcraft for Ground Target Following , 2012, IEEE Transactions on Industrial Electronics.

[27] Long Quan,et al. A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Sudipta N. Sinha,et al. REAL-TIME VIDEO-BASED RECONSTRUCTION OF URBAN ENVIRONMENTS , 2007 .

[29] N. Sudha,et al. Hardware-Efficient Image-Based Robotic Path Planning in a Dynamic Environment and Its FPGA Implementation , 2011, IEEE Transactions on Industrial Electronics.

[30] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31] Alex Ruderman,et al. About Voltage Total Harmonic Distortion for Single- and Three-Phase Multilevel Inverters , 2015, IEEE Transactions on Industrial Electronics.

[32] Matthew N. Dailey,et al. Robust Key Frame Extraction for 3D Reconstruction from Video Streams , 2010, VISAPP.

[33] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34] Jiebo Luo,et al. Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[35] Rama Chellappa,et al. 3D face reconstruction from video using a generic model , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[36] Daniel Henrández-Lobato,et al. Learning feature selection dependencies in multi-task learning , 2013, NIPS 2013.

[37] Yue Gao,et al. Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[38] Yi Yang,et al. Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[39] Markus A. Stricker,et al. Similarity of color images , 1995, Electronic Imaging.

[40] Jean Ponce,et al. Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Xuelong Li,et al. A Fine-Grained Image Categorization System by Cellet-Encoded Spatial Pyramid Modeling , 2015, IEEE Transactions on Industrial Electronics.